[HN Gopher] What I've Learned in the Past Year Spent Building an...
       ___________________________________________________________________
        
       What I've Learned in the Past Year Spent Building an AI Video
       Editor
        
       Author : burningion
       Score  : 134 points
       Date   : 2024-09-23 20:05 UTC (1 days ago)
        
 (HTM) web link (www.makeartwithpython.com)
 (TXT) w3m dump (www.makeartwithpython.com)
        
       | mips_avatar wrote:
       | I agree that building AI on top of the video editor is probably a
       | mistake. Maybe the format of the representation of the video can
       | be something better than a series of matrices of pixel values.
        
         | arjunaaqa wrote:
         | Absolutely true ! Re-imagined AI first products will kill AI
         | patched up legacy products.
         | 
         | Always.
        
           | brianjking wrote:
           | I think this is sometimes true, and certainly after a ton of
           | failure first.
        
       | 35mm wrote:
       | As someone who has worked as a video editor, the most helpful AI
       | tool would be prompt based editing.
       | 
       | For example "find all the interview sections where people are
       | talking about x and make a sequence".
       | 
       | OpusClip claims to have this but it's behind a waitlist.
        
         | yunohn wrote:
         | Not a personal jab, but I am astounded how every day, HN is
         | full of discussion around how articles, newsletters, podcasts,
         | and videos need to be aggregated and summarized for actual
         | consumption. Repeat ad infinitum in both directions.
         | 
         | In my experience, I've always listened to live discussions or
         | read long form blog posts, specifically for the story and
         | obscure points being made. Summaries never capture that and
         | always miss nuances.
        
           | mjburgess wrote:
           | It has a lot to do with the kinds of articles that appear on
           | HN and across the internet. And also, that spending time on
           | something requires being interested in it, and so, there's a
           | larger audience for summaries.
           | 
           | I think, in general, most people have areas of interest to
           | them where it would not occur to them to summarise what
           | they're having fun engaging with.
        
           | pjc50 wrote:
           | Not sure about articles, but people keep recommending multi-
           | hour-long podcasts and videos _far_ beyond the ability of any
           | employed person to keep up with what they might want, so a
           | summary is a useful tool to extract the salient points and
           | possibly consider if something meets the threshold of being
           | better than all the other hour-long things I might want to
           | spend my free hour on.
           | 
           | It sometimes feels like media has bifurcated into hyper-dense
           | (let me explain a whole field of law in a 30 second tiktok)
           | versus hyper-fluffy (documentary with 30 minutes of material
           | spread out into six episodes, with a recap before and after
           | each commercial break), depending on whether the target
           | audience has a job or not.
        
             | reportgunner wrote:
             | Sounds like you're suffering from FOMO if you feel the need
             | to consume summaries of multi-hour content you don't have
             | time to consume.
        
               | ziddoap wrote:
               | Or they are just interested in the content?
        
               | reportgunner wrote:
               | I doubt it.
        
               | acdha wrote:
               | It's also changes in market dynamics. Professional
               | podcasters sell ads so they need lots of content, and the
               | pivot to video or podcasters which advertisers drove
               | means that things which a decade ago would have been a
               | blog post taking 15 minutes to read are now an hour or
               | more commitment for the same amount of information.
               | 
               | This is a common complaint here because HN is so text
               | heavy that you're not going to find many people here who
               | can't read much faster than the average speaker can
               | present information.
        
               | reportgunner wrote:
               | Yeah that's what I meant by spam.
        
               | acdha wrote:
               | If that's what you meant, you didn't say it and it's not
               | spam by normal definition of that term.
        
               | reportgunner wrote:
               | Oh sorry I was talking about my other comment under this
               | post, my bad.
        
           | kombookcha wrote:
           | It's approaching a very strange situation where people make
           | overly wordy and bloated AI generated content and other
           | people try to use AI to compress it back into useful pellets
           | vaguely corresponding to the actual prompts used to generate
           | the initial content. Which were the only bits anybody cared
           | about in the first place.
           | 
           | One guy pays the AI to dig a hole, the other guy pays the AI
           | to fill in the hole. Back and forth they go, raising the BNP
           | but otherwise not accomplishing anything.
        
             | ta1243 wrote:
             | I haven't worried about search engines since I was trying
             | to get my site into yahoo, but my understanding is that
             | they rank long flowery prose far higher than things that
             | are straight to the point.
             | 
             | There's then the added "benefit" of being able to put more
             | adverts in such long text.
             | 
             | One of the main appeals of chatgpt is it just gives you the
             | answer
        
               | dijksterhuis wrote:
               | *an answer
               | 
               | Not necessarily _the_ answer
        
               | ta1243 wrote:
               | So no different to searching online and finding some
               | random page then. In my experience chatgpt is usually far
               | more accurate, and as it gets right to the point you have
               | far more time to understand if the answer is reasonable
        
               | skydhash wrote:
               | No one searches online for a random page. You search for
               | something you may or may not find. You don't go in a
               | library looking for Jules Verne and get out with any
               | random book. I can agree that search engines may be bad,
               | but they don't create web sites out of thin air.
        
               | ta1243 wrote:
               | I wanted to know when the clocks went back in the US and
               | UK earlier.
               | 
               | ---------
               | 
               | when do clocks go back uk and us
               | 
               | ChatGPT said:
               | 
               | In 2024, clocks go back on Sunday, October 27th in the UK
               | and most of Europe, marking the end of Daylight Saving
               | Time (DST). At 2:00 AM, clocks are set back one hour to
               | 1:00 AM, giving people an extra hour of sleep. This marks
               | the shift back to Standard Time and will last until
               | spring when clocks go forward again.
               | 
               | In the United States, the clocks will go back a bit
               | later, on Sunday, November 3rd, 2024.
               | 
               | ---------
               | 
               | Compare to using a search engine to find this out, which
               | involves one search, then clicking another page, then
               | finding out the dates for the UK, then searching for the
               | US, multiple pages, multiple paragraphs of text
               | 
               | First result was the evening standard
               | 
               | ---------
               | 
               | What date do clocks go back in 2024 and when does British
               | Summer Time end?
               | 
               | Brits will get an extra hour of sleep from next month as
               | the days get shorter and shorter.
               | 
               | The temperatures are starting to drop, marking the end of
               | summer - even if it's not going quietly. Nonetheless,
               | autumn is well and truly on the way and that also marks
               | the end of British Summer Time (BST).
               | 
               | For those who aren't a fan of dark mornings, that means
               | you'll gain one hour of sleep.
               | 
               | The custom of changing the clocks twice a year has been
               | around in the UK for over a century, taking place once in
               | March and once in October.
               | 
               | There's still a little while until the clocks change but
               | the date is already known, as it always happens on the
               | last Sunday of October.
               | 
               | In 2019, the European Parliament voted to scrap mandatory
               | daylight saving but Britain has no plans to, err, see the
               | light.
               | 
               | This is what it all means for the UK.
               | 
               | When do the clocks go back?
               | 
               | The clocks go back on Sunday, October 27 at 2am.
               | 
               | ---------
               | 
               | All that nonsense to parse and I still haven't got the US
               | date
        
               | nonameiguess wrote:
               | Strange experience. I tried to replicate it by typing "US
               | daylight savings time" into my URL bar and Duck Duck Go's
               | summary blurb at the top of the results says "Daylight
               | Savings Time Ends Sunday, November 3rd, 2024" and the
               | first result is Wikipedia. Without even following it, the
               | summary on the search page says "in the US, daylight
               | savings time begins on the second Sunday in March and
               | ends on the first Sunday in November."
               | 
               | Hacker News commenters seem to consistently have far more
               | trouble searching for things than I do and I don't get
               | it.
        
               | skydhash wrote:
               | They do questions-based, not query-based search. The
               | trick is knowing the right keywords, which is fairly
               | easy.
        
               | gosub100 wrote:
               | Tiny nit: it's daylight saving time.
        
               | skydhash wrote:
               | Because a search engine is not an answer engine. I just
               | type 'daylight saving time uk' and 'daylight saving time
               | us' and the answer was right at the top [0].
               | 
               | You're supposed to give a query, not a question (even
               | though google et al. have worked hard to trick people
               | into that). Which is why search engines works for me even
               | if there are lot of garbage filled sites.
               | 
               | [0]: https://ibb.co/GpZ19nK (screenshot)
        
               | mschuster91 wrote:
               | > Because a search engine is not an answer engine.
               | 
               | People have come to expect that though, and until a few
               | years ago Google had actually gotten _really good_ at it,
               | partially because people finally started using structured
               | metadata to give context.
        
               | msabalau wrote:
               | Hmmm, not entirely certain about that metaphor.
               | 
               | I do that sort of thing all the time. Sure it is nice to
               | walk out with the Verne, but I am quite certain that I'll
               | probably be walking out with several random books, with
               | or without the one I was looking for.
        
               | downWidOutaFite wrote:
               | It's common. I get annoyed at my wife all the time for
               | jumping to conclusions from some random piece of web
               | info.
        
               | acdha wrote:
               | It's clearly different in that ChatGPT sounds
               | authoritative but you still have to track down sources
               | and make sure they're correctly summarized and accurate.
               | Search doesn't give you the impression that you're doing
               | anything else but ChatGPT always sounds authoritative
               | even when it's wrong, which makes it a hazard for the
               | people who need it the most because they don't have the
               | personal expertise to recognize when it goes off track.
        
               | ta1243 wrote:
               | And webpages always sound authoritative even when they're
               | wrong.
        
               | acdha wrote:
               | There's a key difference to understand: web pages have
               | individual reputation. If I see something about the moon
               | landings on NASA.gov I assign it a different trust level
               | than something I read on youcanthandlethetruth.social,
               | whereas LLM output comes with the imprimatur of the
               | company which made the system. Some LLMs do generate
               | citations but those don't always exist, come from
               | authoritative sources, or say what they're listed as
               | saying but users are notoriously prone to not checking
               | unless they're primed to be suspicious.
        
           | torginus wrote:
           | You don't understand! I need to procrastinate more
           | efficiently!
        
           | reportgunner wrote:
           | People use these summaries to generate spam which they sell
           | to advertising networks, that's why they keep talking about
           | it.
        
           | giancarlostoro wrote:
           | Thats fair, and there will always be people who want
           | summaries.
        
           | cultureswitch wrote:
           | I generally agree with you when it comes to learning-focused
           | content but there are definite cases where using an AI
           | summary makes a lot of sense.
           | 
           | Imagine searching for a guide on how to disassemble your
           | laptop. Unfortunately, you can only find a 30 minute video
           | which is full of rambling, ads or other things irrelevant to
           | you. You can at least in theory use AI to produce a textual
           | summary which contains only the disassembly instructions and
           | relevant snapshots of the video.
           | 
           | All professionals I've ever talked to seem to agree that
           | videos are a terrible form of reference information (i.e. you
           | need information to accomplish a task right now).
           | 
           | The same applies to recipe websites: an AI that can throw all
           | the fluff away is useful considering the annoying habit of
           | the authors to seemingly write about everything but
           | ingredients and the steps necessary to cook the dish.
           | 
           | I think this relates to the
           | https://nick.groenen.me/posts/the-4-types-of-technical-
           | docum... as in any documentation that serves immediate work
           | rather than learning should be straight to the point with as
           | little clutter as possible.
        
           | authorfly wrote:
           | I totally agree. What is life living with just summaries?
           | 
           | Podcasts and blog posts fall into "unique
           | value/view/information I am learning" or entertainment
           | "something that feels like a (parasocial) friend - content I
           | can predictably expect and get some dopamine/sense of
           | socialness from".
           | 
           | Summaries for the former remove the eureka moments and brain
           | connections between ideas, replacing them with takeaways, and
           | summaries for the latter are like summarizing a TV episode in
           | text: no entertainment tends to really come from it.
           | 
           | I think it comes from having many messages at work, and I get
           | that. When you have 50-100 messages/documents a day, quick
           | summaries are a lifesaver, they help you filter, avoid, or
           | get to the facts. But for things I select listening to.. for
           | those hours of rest or (scientific) curiosity in my life..
           | summaries are not a virtue.
           | 
           | (for Parasocial - the feeling is: This person won't update me
           | on their relationship problems, they'll explain a cool thing
           | about castles to me and share their opinion, etc.)
        
           | exe34 wrote:
           | I don't read much online drivel, but the way I would describe
           | my interest in AI summary/model building, is that I do read a
           | few articles/books deeply, but these refer to many other
           | things that it would be useful to have a general picture of
           | in my mind, but I'm never going to put the manual effort into
           | building that surrounding structure.
           | 
           | E.g. I'm interested in classical art, and come across a lot
           | of "he painted this while he was in $X before he moved to
           | $Y". I'd like information about $X and $Y to be also
           | available, how far apart are they, were they ruled by the
           | same people, etc. But I won't be doing that sort of digging
           | myself, I'd like it to show up next to what I'm reading,
           | because I (will) have an AI reading along and doing this work
           | for me.
        
         | tylerekahn wrote:
         | Check out https://kino.ai (YC S23)
        
         | burningion wrote:
         | Author here.
         | 
         | Yes, this is a big feature I've been working on, should be
         | ready for a beta by the end of the month.
         | 
         | I allude to it in the post, but good search (for editing) is a
         | challenge, and necessitates a mix of embeddings/vector search
         | and text models.
        
           | liotier wrote:
           | Derushing in general is the most time consuming, so not only
           | language pattern recognition but also image recognition:
           | "From the rushes, extract all the sequences with bicycle
           | crashes to give me a pile of clips to use in my edit" !
        
             | burningion wrote:
             | Yes, agreed.
             | 
             | I film a bunch of skateboarding, and it can take tens of
             | tries to land a trick. Similarly, there's usually an unique
             | sound that signals a trick was finally landed.
             | 
             | Good multi-modal search and discovery is a huge part of
             | cracking the editing problem.
        
               | liotier wrote:
               | Looks like https://kino.ai addresses that derushing
               | stage, but as a specialized tool rather than as a
               | function inside a video editor - which makes a lot of
               | sense to me.
        
               | sitkack wrote:
               | Detect the cheer everyone makes when the trick lands.
               | Lots of proxy indicators to key off of.
        
               | trinix912 wrote:
               | Tens? It sometimes takes my crew hundreds of tries (all
               | on DV tapes).
               | 
               | How far have you been able to come with search for trick
               | variations? It would be interesting to see a system that
               | can reliably recognize what's switch, nollie vs fakie
               | etc. Then have it generate a list of all tricks for each
               | skater and perhaps outstanding fails. Just some thoughts.
        
           | nashashmi wrote:
           | > I allude to it
           | 
           | And that's why I read the comments to see if anyone mentioned
           | it.
           | 
           | To be able to literally take the source files used to put the
           | video together and edit each piece individually would be
           | great.
           | 
           | I wanted to create a car driving down a road covered in
           | arches if greenery. I got lots of great options but I wanted
           | a particular combination of options. If I could do something
           | like that with video, that would be terrific
        
         | klabb3 wrote:
         | As an outsider: sounds like the main value lies in the AI
         | extracting detailed and accurate (but heuristic) metadata from
         | video: audio transcriptions, text, people, environment and
         | objects.
         | 
         | Once that's there, you can use it for organizing, searching,
         | filtering, or whatever you want. It does not need to be coupled
         | with an LLM-based interface.
         | 
         | ML models for eg face & object recognition have been deployed
         | in both local- and cloud based photo organization for at least
         | a decade. I very much welcome transformers to do a much better
         | job, but I also very much reject the everything-is-a-prompt
         | hammer as a solution to all problems. _Especially_ in deep and
         | professional workflows where details matter.
        
         | wk_end wrote:
         | You should check out scenery.video (disclaimer: I have a
         | relationship with the company)
        
       | sfmike wrote:
       | what do you think of this versus the ai that is hiring actors
       | that are then reused as models in the videos via script
        
         | burningion wrote:
         | Author here. I imagine that being one of the components you can
         | "plug in" to what I'm building.
         | 
         | Imagine taking in a prompt, which describes the video you'd
         | like generated. At render time you pass along variables which
         | get injected to describe the specifics for your audience.
         | 
         | We can then adjust the video edit according to that audience,
         | including mixing generated and non-generated content.
        
       | lukaqq wrote:
       | Impressive blog! I am building a professional web video editor -
       | https://chillin.online and trying to embed various AI workflows
       | into it. Your article has given me a lot of inspiration. Thank
       | you!
        
         | b-lee wrote:
         | Looks so interesting..
        
       | SCUSKU wrote:
       | Love the author sharing their winding journey as well as the
       | tools and things they learned along the way. You can tell the
       | author did grow a lot through this process, and through the year.
       | Great stuff, thanks for sharing these great tips :D
        
       | Narciss wrote:
       | Good work on pushing through. It's like you say, building
       | anything is an achievement.
        
         | ericmcer wrote:
         | Seriously, every person needs the opportunity to really throw
         | themselves into creating something for a year. I think so many
         | people walk around thinking "if only I had
         | time/money/space/whatever I could do something amazing".
         | 
         | It is really humbling to actually try it and realize how
         | difficult making anything original is. You also realize that...
         | you just might not be talented haha.
        
       | 1oooqooq wrote:
       | did i miss something or this is "video editing was too hard so i
       | just made a Wikipedia reading bot that generates drivel for
       | Instagram and TikTok at the same time"?
        
         | burningion wrote:
         | Author here.
         | 
         | This is a genuine concern of mine! I don't want to build
         | something that generates slop.
         | 
         | Rather, I think whenever we change the costs / process of
         | things, new possibilities open up.
         | 
         | As an example, last night I re-watched Starship Troopers for
         | the six-hundredth time. I'm a huge fan of Paul Verhoeven.
         | 
         | What if I could watch a custom edit of Starship Troopers on
         | demand, and this edit surprised me with something new? I don't
         | know exactly how this would look, but maybe it's interesting?
         | 
         | It's tough to predict the future and how things will change.
         | 
         | But I'd rather be participating in its creation, trying to make
         | it better.
        
           | scudsworth wrote:
           | >What if I could watch a custom edit of Starship Troopers on
           | demand, and this edit surprised me with something new? I
           | don't know exactly how this would look, but maybe it's
           | interesting?
           | 
           | this is not interesting whatsoever actually
        
             | burningion wrote:
             | lol alright, appreciate the skepticism.
             | 
             | What if instead of an algorithm designed to hold your
             | attention captive to sell you shit, a feed of videos
             | created to help you focus on what you aspire to learn / be
             | / do?
             | 
             | idk probably bullshit too, but why not?
        
               | hackable_sand wrote:
               | You have no idea what you're doing do you
               | 
               | Bfr
        
               | jeffmcjunkin wrote:
               | We don't advance as a society unless people ask new
               | questions. Having folk willing to spend some time
               | answering those questions (in public, no less!) helps
               | others. It's really, really damn hard to predict how
               | advancements in one area can help another.
               | 
               | All that said, thanks for your interesting new question,
               | and thanks for spending time on it :D
        
           | AlienRobot wrote:
           | >What if I could watch a custom edit of Starship Troopers on
           | demand, and this edit surprised me with something new? I
           | don't know exactly how this would look, but maybe it's
           | interesting?
           | 
           | Is this what you want to do?
           | https://www.youtube.com/watch?v=6sUR6ylVH7E
        
       | Arelius wrote:
       | Because I don't see it mentioned elsewhere, I wanted to plug
       | OpenTimelineIO, as a lot of the industry is building support
       | around it as a format right now, and it would be great for any
       | new video editor to support.
       | 
       | https://opentimelineio.readthedocs.io/en/stable/
        
       | mcdow wrote:
       | It seems like we are currently in the "skeuomorphic" product
       | design era for AI products. Which is to say we are building the
       | same products but with AI tacked on. I appreciate that you are
       | approaching this problem from first principles and attempting to
       | break from the model of the previous generation. Kudos.
        
       ___________________________________________________________________
       (page generated 2024-09-24 23:01 UTC)