[HN Gopher] Automatic Video Editing
       ___________________________________________________________________
        
       Automatic Video Editing
        
       Author : todsacerdoti
       Score  : 90 points
       Date   : 2021-03-23 12:41 UTC (10 hours ago)
        
 (HTM) web link (tratt.net)
 (TXT) w3m dump (tratt.net)
        
       | johnx123-up wrote:
       | Just curious.. how is this better than
       | https://davidbieber.com/snippets/2020-02-21-jump-cut-program... ?
        
       | netik wrote:
       | A lot of the issues raised in this post are well solved by
       | commercial video editing software.
       | 
       | For instance, markers and scene detection / edit lists in
       | DaVinci, or word by word editing with recognition in Descript.
       | 
       | I feel like the OP should give them a try before trying to apply
       | the universal script hammer to this problem. The ffmpeg line
       | alone is frightening.
       | 
       | Use of unix time instead of timecode is another problem here.
       | 
       | It's cool that he wrote this but it doesn't conform to any video
       | standards in use.
        
         | moyix wrote:
         | The author seems to be using OpenBSD, so I imagine that rather
         | limits their options.
        
         | jjice wrote:
         | My guess is that it was written because the author much prefers
         | writing code and automating tasks to video editing, and it was
         | worth the extra time for them to do that.
         | 
         | Out of curiosity, do any commercial video editing programs
         | offer automation/scripting? That seems like a great place for
         | scripting to be available to help alleviate common tedious
         | tasks, if it's not already there. I think that could lead to a
         | best of both worlds.
        
           | kfarr wrote:
           | Yes there are many existing automation options for video
           | production applications. For example, the concept of a
           | portable EDL (edit decision list) file is already well
           | established in the industry for decades:
           | https://en.wikipedia.org/wiki/Edit_decision_list
           | 
           | Scripting languages are also available for specific post
           | production applications such as Final Cut:
           | https://en.wikipedia.org/wiki/FXScript
           | 
           | Then there grew a whole crop of cloud-based scripted content
           | generators like stupeflix, sundaysky, idomoo.com.
           | 
           | I think the author did a great job creating a fun project and
           | explaining the ffmpeg workflow, but a video professional has
           | many off-the-shelf options already.
        
           | netik wrote:
           | descript is the only thing I've seen that turns video editing
           | into something resembling word processing - it gets very jump
           | cut-y and jerky but most people are used to it in the
           | TikTok/YT era.
           | 
           | It's terrifically buggy right now but as a PoC it's amazing.
        
           | netik wrote:
           | oh, and scripting wise you can fully automate davinci with
           | python.
        
       | dceddia wrote:
       | I've been thinking a lot about this problem myself! Making
       | screencasts, I realized most of my editing time was spent cutting
       | out silence. If I recorded with that in mind (staying silent
       | between re-takes), the editing became pretty straightforward but
       | really tedious.
       | 
       | I made my own automatic editor tool [0] recently, and the
       | approach I started with is simply cutting out the silent parts.
       | 
       | I looked at other options like jumpcutter.py and auto-editor.py
       | before building it, and they helped prove out the idea, but I
       | really wanted an interactive UI, so I built an app in Swift.
       | 
       | I figured the automated approach is probably not gonna be
       | perfect, so I built it to export XML/EDL/ScreenFlow files that
       | can be imported to other editors for fine-tuning. It works pretty
       | well and people seem happy with it so far.
       | 
       | Someone else mentioned timestamps vs. timecode. Maaan, timing has
       | been a real thorn in my side with this project. The "even"
       | framerates aren't too bad (30fps/60fps/etc) but the uneven ones
       | are a huge pain (29.97, 59.94). One fun problem recently was
       | figuring out how to bin audio samples at 48khz into frames at
       | 29.97. Because each frame holds an uneven number (1601.6), I had
       | to alternate between assigning 1602/1601 to each frame, or else
       | my idea of time would slowly but steadily skew out of sync. As
       | someone who'd never worked with video before, this has been a fun
       | adventure.
       | 
       | Right now I'm working on adding more manual control over the
       | cuts, and this kind of stuff is what I'd like to tackle next!
       | Automatic scene detection, ability to leave markers during
       | recording, more control over transitions, stuff like that would
       | be really cool. Feels to me like automatic editing might get more
       | popular as more people realize it's even possible.
       | 
       | 0: https://getrecut.com
        
         | jjnoakes wrote:
         | By "Automatic scene detection" do you mean somehow detecting in
         | the video if interesting things are going on and avoiding cuts
         | around those spots? Because a lot of videos I record have
         | silence in the audio when important things might be happening
         | in the video, and having to manually go find those places is a
         | bit of a pain.
         | 
         | Of course, automatically detecting which parts of the video are
         | interesting or not is probably impossible, but it sure feels
         | like an interesting problem to try to tackle.
        
           | dceddia wrote:
           | Something like that. I could base it on frame-to-frame
           | changes, or if my app was doing the screen recording, I could
           | look at keyboard/mouse input as another signal of "non-
           | silence".
           | 
           | I'm pretty much looking at silence removal as a good starting
           | point. My overall goal is to cut down on the manual editing
           | required, so just looking for repetitive processes that I
           | could add automations for.
        
         | clawoo wrote:
         | Your software is such a great idea. The UI could use some work
         | to be more attractive, but the core functionality is top notch.
         | 
         | Here's a suggestion- using Speech.framework[0] you could
         | probably quite easily transcribe the audio and identify filler
         | words ("umm", "hmm", etc) and add an option to automatically
         | exclude those as well.
         | 
         | [0] https://developer.apple.com/documentation/speech
        
           | dceddia wrote:
           | Thanks! I like the Speech framework idea. I've toyed with it
           | a bit and the results not so great, especially offline, and
           | the online one has some limits. I think if I want to properly
           | add transcription I'll need to integrate with some SaaS
           | solution, but I need to do a little more experimenting first.
           | 
           | Do you have any specific suggestions or critiques for the UI?
           | I definitely agree it could be more attractive but I've had
           | trouble figuring out what to do besides "make it look like
           | Final Cut" or whatever. (or maybe I should actually do just
           | that!)
        
       | nate wrote:
       | Love stuff like this. It feels like we're close to some really
       | interesting things here but I haven't quite seen it yet.
       | Facebook/Apple have their "auto movies" but they're largely just
       | montages over music. Any interesting/useful audio captured in
       | those clips just seems ignored.
       | 
       | My brain cycles on thoughts of what GPT-3 like things could be
       | enable here possibly. Could there be some interesting algorithm
       | trained on which clip should come next kind of AI: these 10s or
       | skip and check again.
       | 
       | Self promotion, but I did fool with a way to try and automate
       | making stop motion movies: https://www.trylocomotion.com
       | 
       | Rudimentary process of letting people take a video and doing a
       | reverse motion detection algorithm. "When there's nothing moving
       | in frame, use that frame for the stop motion movie." But that was
       | a fun dive into this world.
        
       | bluetwo wrote:
       | "Aeschylus is the worst written bit of software I've put my name
       | to since I was about 15 years old and it will probably never be
       | usable for anyone other than me."
       | 
       | Hilarious and honest.
        
       | PeterisP wrote:
       | This reminds me of a cool tech demo about "enchanced tool" for
       | video editing I saw in January -
       | https://www.youtube.com/watch?v=Bl9wqNe5J8U from descript.com (no
       | affiliation).
        
       | nickjj wrote:
       | After having recorded close to 600 screencast videos, I automated
       | a number of setup and teardown processes too.
       | 
       | Such as using the Sizer[0] tool to move windows to specific
       | 1920x1080 coordinates of the screen where I configured OBS to
       | record from. This way my desktop resolution never needs to
       | change. Using Sizer requires right clicking a title bar and
       | choosing a pre-created menu item and it auto-resizes and
       | positions the window correctly. Very painless.
       | 
       | But I also have these little shell scripts that are responsible
       | for setting up font sizes and making sure my history is clear.
       | Not showing history is so important if you're using CTRL+r and
       | FZF frequently because having to blur stuff later is time
       | consuming and error prone (ie, missing 1 frame of blur by
       | accident). The stop record script reverts everything back to
       | normal.                   record-start () {             mv
       | ~/.bash_history ~/.bash_history.bak && history -c             rm
       | /tmp/%*             change_terminal_font 9 18                  if
       | [[ "${1}" = "--obs" ]]; then                 cd "/c/Program
       | Files/obs-studio/bin/64bit"                 wslview obs64.exe
       | cd -             fi         }              record-stop () {
       | mv ~/.bash_history.bak ~/.bash_history && history -r
       | change_terminal_font 18 9         }
       | change_terminal_font () {             [[ -z "${1}" || -z "${2}"
       | ]] && echo "Usage: change_terminal_font FROM_SIZE TO_SIZE"
       | from="${1}"             to="${2}"
       | windows_user="$(powershell.exe '$env:UserName' | sed -e
       | 's/\r//g')"             terminal_config="/c/Users/${windows_user}
       | /AppData/Local/Packages/Microsoft.WindowsTerminal_8wekyb3d8bbwe/L
       | ocalState/settings.json"             perl -i -pe "s/\"fontSize\":
       | ${from}/\"fontSize\": ${to}/g" "${terminal_config}"         }
       | 
       | I don't think I'll ever automate the editing process because
       | editing is where you can throw in a lot of human nice touches,
       | like zooming into a specific area of the screen for emphasis or
       | adding an overlay picture for context.
       | 
       | But I do try to make things as live as possible, such as using
       | OBS scenes to cut down on post processing editing. That and
       | automating your audio processing so you don't need to edit your
       | audio afterwards has given me the biggest bank for my buck in
       | terms of how fast I can go from an idea in my head to a video
       | ready for YouTube.
       | 
       | A complete list of tools that I use for dev + recording + editing
       | can be found here: https://nickjanetakis.com/blog/the-tools-i-use
       | 
       | [0]: http://www.brianapps.net/sizer4/
        
         | stevenicr wrote:
         | checked your tools page for "zoom" with ctrl-f - wondering what
         | you are using for zooming in.
         | 
         | some years ago I had a microsoft mouse that included a third
         | button and a driver-addon (I think) - that gave a great
         | magnified box you could move around the screen until you
         | unclicked the third button)
         | 
         | I would love to find a way to do this magnified box again -
         | when finding zoom in your comments, should I assume that you
         | are using camtasia and doing it in post?
         | 
         | I'd love to have this zoom ability for making videos but also
         | when screen sharing live.
        
           | nickjj wrote:
           | Live zooming is something I tried but ultimately stopped
           | doing it because it's too difficult to live code + narrate my
           | thought process + zoom in on demand. Maybe if I had a foot
           | pedal or something to control it heh.
           | 
           | I zoom in post production during the editing process. Once
           | you get used to your tools it's fast. It takes about a minute
           | to zoom into a specific area of the screen, position it in
           | the exact spot I want and then eventually zoom out back to
           | normal. I like this process because it lets you adjust the
           | zoom transition speed as needed and sometimes I also offset
           | the X / Y coords to center it, etc..
        
         | SeanFerree wrote:
         | I agree. I could never have editing automated. It takes a
         | while, but like you said, it adds a personal touch
        
         | EricE wrote:
         | I dunno - tools like this can take care of the 80% drudgery,
         | freeing you to really focus on that 20% that provides the human
         | touch :)
         | 
         | Unless you are also recording all your streams before you make
         | your on the fly directors cut with OBS, if you make an "ooops"
         | you're done. With his approach if you don't like the automatic
         | edits, the underlying source files are still there and you can
         | override the automation.
         | 
         | It would be easier if he did the automation within a
         | traditional NLE workflow; overriding the automatic editing
         | would be a lot easer. Since, ya know, editors were designed to
         | make and keep track of changes (ha!)
        
           | nickjj wrote:
           | I think it comes down to recording styles too.
           | 
           | My work flow is to start recording with OBS. Since I use a
           | webcam in the corner while recording my screen I'm aiming for
           | as little cuts as possible because with a webcam unless your
           | face is positioned exactly how it was before watchers will
           | see the jump cut (even if it's subtle). Editing where to cut
           | manually to produce the least visible cut is an art form and
           | takes a human touch.
           | 
           | But I'll press record and do my best. If I get let's say 5
           | minutes out of a 20 minute video down solid but screw up then
           | I'll stop the recording. Then I'll start recording another
           | file with OBS and resume where I left off trying to place my
           | mouse cursor exactly where it was and lead off by saying what
           | I was saying before so it flows.
           | 
           | In the end I might have 2-5 relatively good videos that I
           | then edit together using an NLE tool. Knowing where the cuts
           | are is easy because it's pretty much the beginning of the
           | file to the end.
           | 
           | This also helps reduce massive file sizes where you end up
           | with like a single 45 minute source video that gets edited
           | down to 20 minutes because you made a ton of little mistakes.
           | At a decent scale this matters because disk space while cheap
           | isn't free for someone who is just a solo developer and does
           | everything from 1 dev box. Usually by the end of a recording
           | session I'll have like 15 source videos that I delete because
           | I know they came out bad (often times getting into the first
           | few minutes is the hardest for me), here's a screenshot of
           | what I mean haha
           | https://twitter.com/nickjanetakis/status/1347574482714685441.
           | 
           | Normally I edit my stuff at 2x speed. If I'm not adding a lot
           | of extra effects (zooming, tooltips, highlights, blurs,
           | overlays, etc.) it goes by pretty fast. With work flows that
           | you're used to editing really isn't that bad. It's creating
           | the content / material and executing the human part
           | (delivering the video -- execution basically) that takes most
           | of the time.
           | 
           | Then there's also editing things like an audio only group
           | podcast. There's no way an automated editing process is going
           | to be able to intelligently remove ums and ahs but leaving in
           | a few at key places to make things sound more natural, or
           | maybe removing a bit of stuttering from someone's line in a
           | way that no one would ever notice. Or perhaps cutting out 2
           | minutes all together because it doesn't add much to the
           | conversation and isn't referenced later so it's safe to cut
           | and no one would ever know.
        
       ___________________________________________________________________
       (page generated 2021-03-23 23:01 UTC)