[HN Gopher] What I've Learned in the Past Year Spent Building an...
___________________________________________________________________
What I've Learned in the Past Year Spent Building an AI Video
Editor
Author : burningion
Score : 134 points
Date : 2024-09-23 20:05 UTC (1 days ago)
(HTM) web link (www.makeartwithpython.com)
(TXT) w3m dump (www.makeartwithpython.com)
| mips_avatar wrote:
| I agree that building AI on top of the video editor is probably a
| mistake. Maybe the format of the representation of the video can
| be something better than a series of matrices of pixel values.
| arjunaaqa wrote:
| Absolutely true ! Re-imagined AI first products will kill AI
| patched up legacy products.
|
| Always.
| brianjking wrote:
| I think this is sometimes true, and certainly after a ton of
| failure first.
| 35mm wrote:
| As someone who has worked as a video editor, the most helpful AI
| tool would be prompt based editing.
|
| For example "find all the interview sections where people are
| talking about x and make a sequence".
|
| OpusClip claims to have this but it's behind a waitlist.
| yunohn wrote:
| Not a personal jab, but I am astounded how every day, HN is
| full of discussion around how articles, newsletters, podcasts,
| and videos need to be aggregated and summarized for actual
| consumption. Repeat ad infinitum in both directions.
|
| In my experience, I've always listened to live discussions or
| read long form blog posts, specifically for the story and
| obscure points being made. Summaries never capture that and
| always miss nuances.
| mjburgess wrote:
| It has a lot to do with the kinds of articles that appear on
| HN and across the internet. And also, that spending time on
| something requires being interested in it, and so, there's a
| larger audience for summaries.
|
| I think, in general, most people have areas of interest to
| them where it would not occur to them to summarise what
| they're having fun engaging with.
| pjc50 wrote:
| Not sure about articles, but people keep recommending multi-
| hour-long podcasts and videos _far_ beyond the ability of any
| employed person to keep up with what they might want, so a
| summary is a useful tool to extract the salient points and
| possibly consider if something meets the threshold of being
| better than all the other hour-long things I might want to
| spend my free hour on.
|
| It sometimes feels like media has bifurcated into hyper-dense
| (let me explain a whole field of law in a 30 second tiktok)
| versus hyper-fluffy (documentary with 30 minutes of material
| spread out into six episodes, with a recap before and after
| each commercial break), depending on whether the target
| audience has a job or not.
| reportgunner wrote:
| Sounds like you're suffering from FOMO if you feel the need
| to consume summaries of multi-hour content you don't have
| time to consume.
| ziddoap wrote:
| Or they are just interested in the content?
| reportgunner wrote:
| I doubt it.
| acdha wrote:
| It's also changes in market dynamics. Professional
| podcasters sell ads so they need lots of content, and the
| pivot to video or podcasters which advertisers drove
| means that things which a decade ago would have been a
| blog post taking 15 minutes to read are now an hour or
| more commitment for the same amount of information.
|
| This is a common complaint here because HN is so text
| heavy that you're not going to find many people here who
| can't read much faster than the average speaker can
| present information.
| reportgunner wrote:
| Yeah that's what I meant by spam.
| acdha wrote:
| If that's what you meant, you didn't say it and it's not
| spam by normal definition of that term.
| reportgunner wrote:
| Oh sorry I was talking about my other comment under this
| post, my bad.
| kombookcha wrote:
| It's approaching a very strange situation where people make
| overly wordy and bloated AI generated content and other
| people try to use AI to compress it back into useful pellets
| vaguely corresponding to the actual prompts used to generate
| the initial content. Which were the only bits anybody cared
| about in the first place.
|
| One guy pays the AI to dig a hole, the other guy pays the AI
| to fill in the hole. Back and forth they go, raising the BNP
| but otherwise not accomplishing anything.
| ta1243 wrote:
| I haven't worried about search engines since I was trying
| to get my site into yahoo, but my understanding is that
| they rank long flowery prose far higher than things that
| are straight to the point.
|
| There's then the added "benefit" of being able to put more
| adverts in such long text.
|
| One of the main appeals of chatgpt is it just gives you the
| answer
| dijksterhuis wrote:
| *an answer
|
| Not necessarily _the_ answer
| ta1243 wrote:
| So no different to searching online and finding some
| random page then. In my experience chatgpt is usually far
| more accurate, and as it gets right to the point you have
| far more time to understand if the answer is reasonable
| skydhash wrote:
| No one searches online for a random page. You search for
| something you may or may not find. You don't go in a
| library looking for Jules Verne and get out with any
| random book. I can agree that search engines may be bad,
| but they don't create web sites out of thin air.
| ta1243 wrote:
| I wanted to know when the clocks went back in the US and
| UK earlier.
|
| ---------
|
| when do clocks go back uk and us
|
| ChatGPT said:
|
| In 2024, clocks go back on Sunday, October 27th in the UK
| and most of Europe, marking the end of Daylight Saving
| Time (DST). At 2:00 AM, clocks are set back one hour to
| 1:00 AM, giving people an extra hour of sleep. This marks
| the shift back to Standard Time and will last until
| spring when clocks go forward again.
|
| In the United States, the clocks will go back a bit
| later, on Sunday, November 3rd, 2024.
|
| ---------
|
| Compare to using a search engine to find this out, which
| involves one search, then clicking another page, then
| finding out the dates for the UK, then searching for the
| US, multiple pages, multiple paragraphs of text
|
| First result was the evening standard
|
| ---------
|
| What date do clocks go back in 2024 and when does British
| Summer Time end?
|
| Brits will get an extra hour of sleep from next month as
| the days get shorter and shorter.
|
| The temperatures are starting to drop, marking the end of
| summer - even if it's not going quietly. Nonetheless,
| autumn is well and truly on the way and that also marks
| the end of British Summer Time (BST).
|
| For those who aren't a fan of dark mornings, that means
| you'll gain one hour of sleep.
|
| The custom of changing the clocks twice a year has been
| around in the UK for over a century, taking place once in
| March and once in October.
|
| There's still a little while until the clocks change but
| the date is already known, as it always happens on the
| last Sunday of October.
|
| In 2019, the European Parliament voted to scrap mandatory
| daylight saving but Britain has no plans to, err, see the
| light.
|
| This is what it all means for the UK.
|
| When do the clocks go back?
|
| The clocks go back on Sunday, October 27 at 2am.
|
| ---------
|
| All that nonsense to parse and I still haven't got the US
| date
| nonameiguess wrote:
| Strange experience. I tried to replicate it by typing "US
| daylight savings time" into my URL bar and Duck Duck Go's
| summary blurb at the top of the results says "Daylight
| Savings Time Ends Sunday, November 3rd, 2024" and the
| first result is Wikipedia. Without even following it, the
| summary on the search page says "in the US, daylight
| savings time begins on the second Sunday in March and
| ends on the first Sunday in November."
|
| Hacker News commenters seem to consistently have far more
| trouble searching for things than I do and I don't get
| it.
| skydhash wrote:
| They do questions-based, not query-based search. The
| trick is knowing the right keywords, which is fairly
| easy.
| gosub100 wrote:
| Tiny nit: it's daylight saving time.
| skydhash wrote:
| Because a search engine is not an answer engine. I just
| type 'daylight saving time uk' and 'daylight saving time
| us' and the answer was right at the top [0].
|
| You're supposed to give a query, not a question (even
| though google et al. have worked hard to trick people
| into that). Which is why search engines works for me even
| if there are lot of garbage filled sites.
|
| [0]: https://ibb.co/GpZ19nK (screenshot)
| mschuster91 wrote:
| > Because a search engine is not an answer engine.
|
| People have come to expect that though, and until a few
| years ago Google had actually gotten _really good_ at it,
| partially because people finally started using structured
| metadata to give context.
| msabalau wrote:
| Hmmm, not entirely certain about that metaphor.
|
| I do that sort of thing all the time. Sure it is nice to
| walk out with the Verne, but I am quite certain that I'll
| probably be walking out with several random books, with
| or without the one I was looking for.
| downWidOutaFite wrote:
| It's common. I get annoyed at my wife all the time for
| jumping to conclusions from some random piece of web
| info.
| acdha wrote:
| It's clearly different in that ChatGPT sounds
| authoritative but you still have to track down sources
| and make sure they're correctly summarized and accurate.
| Search doesn't give you the impression that you're doing
| anything else but ChatGPT always sounds authoritative
| even when it's wrong, which makes it a hazard for the
| people who need it the most because they don't have the
| personal expertise to recognize when it goes off track.
| ta1243 wrote:
| And webpages always sound authoritative even when they're
| wrong.
| acdha wrote:
| There's a key difference to understand: web pages have
| individual reputation. If I see something about the moon
| landings on NASA.gov I assign it a different trust level
| than something I read on youcanthandlethetruth.social,
| whereas LLM output comes with the imprimatur of the
| company which made the system. Some LLMs do generate
| citations but those don't always exist, come from
| authoritative sources, or say what they're listed as
| saying but users are notoriously prone to not checking
| unless they're primed to be suspicious.
| torginus wrote:
| You don't understand! I need to procrastinate more
| efficiently!
| reportgunner wrote:
| People use these summaries to generate spam which they sell
| to advertising networks, that's why they keep talking about
| it.
| giancarlostoro wrote:
| Thats fair, and there will always be people who want
| summaries.
| cultureswitch wrote:
| I generally agree with you when it comes to learning-focused
| content but there are definite cases where using an AI
| summary makes a lot of sense.
|
| Imagine searching for a guide on how to disassemble your
| laptop. Unfortunately, you can only find a 30 minute video
| which is full of rambling, ads or other things irrelevant to
| you. You can at least in theory use AI to produce a textual
| summary which contains only the disassembly instructions and
| relevant snapshots of the video.
|
| All professionals I've ever talked to seem to agree that
| videos are a terrible form of reference information (i.e. you
| need information to accomplish a task right now).
|
| The same applies to recipe websites: an AI that can throw all
| the fluff away is useful considering the annoying habit of
| the authors to seemingly write about everything but
| ingredients and the steps necessary to cook the dish.
|
| I think this relates to the
| https://nick.groenen.me/posts/the-4-types-of-technical-
| docum... as in any documentation that serves immediate work
| rather than learning should be straight to the point with as
| little clutter as possible.
| authorfly wrote:
| I totally agree. What is life living with just summaries?
|
| Podcasts and blog posts fall into "unique
| value/view/information I am learning" or entertainment
| "something that feels like a (parasocial) friend - content I
| can predictably expect and get some dopamine/sense of
| socialness from".
|
| Summaries for the former remove the eureka moments and brain
| connections between ideas, replacing them with takeaways, and
| summaries for the latter are like summarizing a TV episode in
| text: no entertainment tends to really come from it.
|
| I think it comes from having many messages at work, and I get
| that. When you have 50-100 messages/documents a day, quick
| summaries are a lifesaver, they help you filter, avoid, or
| get to the facts. But for things I select listening to.. for
| those hours of rest or (scientific) curiosity in my life..
| summaries are not a virtue.
|
| (for Parasocial - the feeling is: This person won't update me
| on their relationship problems, they'll explain a cool thing
| about castles to me and share their opinion, etc.)
| exe34 wrote:
| I don't read much online drivel, but the way I would describe
| my interest in AI summary/model building, is that I do read a
| few articles/books deeply, but these refer to many other
| things that it would be useful to have a general picture of
| in my mind, but I'm never going to put the manual effort into
| building that surrounding structure.
|
| E.g. I'm interested in classical art, and come across a lot
| of "he painted this while he was in $X before he moved to
| $Y". I'd like information about $X and $Y to be also
| available, how far apart are they, were they ruled by the
| same people, etc. But I won't be doing that sort of digging
| myself, I'd like it to show up next to what I'm reading,
| because I (will) have an AI reading along and doing this work
| for me.
| tylerekahn wrote:
| Check out https://kino.ai (YC S23)
| burningion wrote:
| Author here.
|
| Yes, this is a big feature I've been working on, should be
| ready for a beta by the end of the month.
|
| I allude to it in the post, but good search (for editing) is a
| challenge, and necessitates a mix of embeddings/vector search
| and text models.
| liotier wrote:
| Derushing in general is the most time consuming, so not only
| language pattern recognition but also image recognition:
| "From the rushes, extract all the sequences with bicycle
| crashes to give me a pile of clips to use in my edit" !
| burningion wrote:
| Yes, agreed.
|
| I film a bunch of skateboarding, and it can take tens of
| tries to land a trick. Similarly, there's usually an unique
| sound that signals a trick was finally landed.
|
| Good multi-modal search and discovery is a huge part of
| cracking the editing problem.
| liotier wrote:
| Looks like https://kino.ai addresses that derushing
| stage, but as a specialized tool rather than as a
| function inside a video editor - which makes a lot of
| sense to me.
| sitkack wrote:
| Detect the cheer everyone makes when the trick lands.
| Lots of proxy indicators to key off of.
| trinix912 wrote:
| Tens? It sometimes takes my crew hundreds of tries (all
| on DV tapes).
|
| How far have you been able to come with search for trick
| variations? It would be interesting to see a system that
| can reliably recognize what's switch, nollie vs fakie
| etc. Then have it generate a list of all tricks for each
| skater and perhaps outstanding fails. Just some thoughts.
| nashashmi wrote:
| > I allude to it
|
| And that's why I read the comments to see if anyone mentioned
| it.
|
| To be able to literally take the source files used to put the
| video together and edit each piece individually would be
| great.
|
| I wanted to create a car driving down a road covered in
| arches if greenery. I got lots of great options but I wanted
| a particular combination of options. If I could do something
| like that with video, that would be terrific
| klabb3 wrote:
| As an outsider: sounds like the main value lies in the AI
| extracting detailed and accurate (but heuristic) metadata from
| video: audio transcriptions, text, people, environment and
| objects.
|
| Once that's there, you can use it for organizing, searching,
| filtering, or whatever you want. It does not need to be coupled
| with an LLM-based interface.
|
| ML models for eg face & object recognition have been deployed
| in both local- and cloud based photo organization for at least
| a decade. I very much welcome transformers to do a much better
| job, but I also very much reject the everything-is-a-prompt
| hammer as a solution to all problems. _Especially_ in deep and
| professional workflows where details matter.
| wk_end wrote:
| You should check out scenery.video (disclaimer: I have a
| relationship with the company)
| sfmike wrote:
| what do you think of this versus the ai that is hiring actors
| that are then reused as models in the videos via script
| burningion wrote:
| Author here. I imagine that being one of the components you can
| "plug in" to what I'm building.
|
| Imagine taking in a prompt, which describes the video you'd
| like generated. At render time you pass along variables which
| get injected to describe the specifics for your audience.
|
| We can then adjust the video edit according to that audience,
| including mixing generated and non-generated content.
| lukaqq wrote:
| Impressive blog! I am building a professional web video editor -
| https://chillin.online and trying to embed various AI workflows
| into it. Your article has given me a lot of inspiration. Thank
| you!
| b-lee wrote:
| Looks so interesting..
| SCUSKU wrote:
| Love the author sharing their winding journey as well as the
| tools and things they learned along the way. You can tell the
| author did grow a lot through this process, and through the year.
| Great stuff, thanks for sharing these great tips :D
| Narciss wrote:
| Good work on pushing through. It's like you say, building
| anything is an achievement.
| ericmcer wrote:
| Seriously, every person needs the opportunity to really throw
| themselves into creating something for a year. I think so many
| people walk around thinking "if only I had
| time/money/space/whatever I could do something amazing".
|
| It is really humbling to actually try it and realize how
| difficult making anything original is. You also realize that...
| you just might not be talented haha.
| 1oooqooq wrote:
| did i miss something or this is "video editing was too hard so i
| just made a Wikipedia reading bot that generates drivel for
| Instagram and TikTok at the same time"?
| burningion wrote:
| Author here.
|
| This is a genuine concern of mine! I don't want to build
| something that generates slop.
|
| Rather, I think whenever we change the costs / process of
| things, new possibilities open up.
|
| As an example, last night I re-watched Starship Troopers for
| the six-hundredth time. I'm a huge fan of Paul Verhoeven.
|
| What if I could watch a custom edit of Starship Troopers on
| demand, and this edit surprised me with something new? I don't
| know exactly how this would look, but maybe it's interesting?
|
| It's tough to predict the future and how things will change.
|
| But I'd rather be participating in its creation, trying to make
| it better.
| scudsworth wrote:
| >What if I could watch a custom edit of Starship Troopers on
| demand, and this edit surprised me with something new? I
| don't know exactly how this would look, but maybe it's
| interesting?
|
| this is not interesting whatsoever actually
| burningion wrote:
| lol alright, appreciate the skepticism.
|
| What if instead of an algorithm designed to hold your
| attention captive to sell you shit, a feed of videos
| created to help you focus on what you aspire to learn / be
| / do?
|
| idk probably bullshit too, but why not?
| hackable_sand wrote:
| You have no idea what you're doing do you
|
| Bfr
| jeffmcjunkin wrote:
| We don't advance as a society unless people ask new
| questions. Having folk willing to spend some time
| answering those questions (in public, no less!) helps
| others. It's really, really damn hard to predict how
| advancements in one area can help another.
|
| All that said, thanks for your interesting new question,
| and thanks for spending time on it :D
| AlienRobot wrote:
| >What if I could watch a custom edit of Starship Troopers on
| demand, and this edit surprised me with something new? I
| don't know exactly how this would look, but maybe it's
| interesting?
|
| Is this what you want to do?
| https://www.youtube.com/watch?v=6sUR6ylVH7E
| Arelius wrote:
| Because I don't see it mentioned elsewhere, I wanted to plug
| OpenTimelineIO, as a lot of the industry is building support
| around it as a format right now, and it would be great for any
| new video editor to support.
|
| https://opentimelineio.readthedocs.io/en/stable/
| mcdow wrote:
| It seems like we are currently in the "skeuomorphic" product
| design era for AI products. Which is to say we are building the
| same products but with AI tacked on. I appreciate that you are
| approaching this problem from first principles and attempting to
| break from the model of the previous generation. Kudos.
___________________________________________________________________
(page generated 2024-09-24 23:01 UTC)