[HN Gopher] Chaining FFmpeg with a Browser Agent
       ___________________________________________________________________
        
       Chaining FFmpeg with a Browser Agent
        
       Author : shardullavekar
       Score  : 85 points
       Date   : 2025-11-04 12:52 UTC (10 hours ago)
        
 (HTM) web link (100x.bot)
 (TXT) w3m dump (100x.bot)
        
       | sylware wrote:
       | HTML <video> or <audio> element with "Streaming" URLs passed to
       | the media player (or internally in the web browser for the big
       | ones).
        
       | utopiah wrote:
       | Have to admit, _ffmpeg_ syntax is not trivial... but also the
       | project is 24 years old and is basically the defacto industry
       | standard. If you believe you will still be editing videos in 20
       | years with the CLI (or any other tool or any programming
       | language) wrapping it then it 's probably worth few hours
       | learning how it actually works.
        
         | shardullavekar wrote:
         | true, companies like Descript, Veed, or Kapwing exist because
         | no coders find this syntax intimidating. Plus, a CLI tool
         | stands out of a workflow. We wanted to change that.
        
           | petetnt wrote:
           | Don't "no coders" find the concepts described in this article
           | imdimitating?
           | 
           | The article states that whatever the article is trying to
           | describe "Takes about ~20-30 mins. The cognitive load is
           | high....". while their literal actual step of "Googling
           | "ffmpeg combine static image and audio."" gives you the
           | literal command you need to run from a known source
           | (superuser.com sourced from ffmpeg wiki).
           | 
           | Anyone even slightly familiar with ffmpeg should be able to
           | produce the same result in minutes. For someone who doesn't
           | understand what ffmpeg is the article means absolutely
           | nothing. How does a "no coder" understand what a "agent in a
           | sandboxed container" is?
        
             | shardullavekar wrote:
             | we took a basic example and described it. (will try adding
             | a complex one)
             | 
             | we have our designer/intern in our minds who creates
             | shorts, adds subtiles, crops them,and merges the audio
             | generated. He is aware of ffmpeg and prefers using a SaaS
             | UI on top of it.
             | 
             | However, we see him hanging out on chatgpt, or gemini all
             | the time. He is literally the no coder we have in mind.
             | 
             | We just combined his type what you want + ffmpeg workflows.
        
               | EraYaN wrote:
               | Wouldn't that intern just use an NLE (be it Premiere,
               | Davinci Resole etc) anyway? If you need to style
               | subtitles and edit shorts and video content, you'll need
               | a proper editor anyway.
        
               | shardullavekar wrote:
               | 1. download a larger video from s3. 2. Use NLE and cut it
               | into shorts. (crop, resize, subtitles etc.) 3. Upload
               | shorts on YouTube, Instagram, Tiktok.
               | 
               | He does use davinci resolve but only for 2.
               | 
               | NLEs make ffmpeg a standalone yet easy to use tool.
               | 
               | Not denying that major heavy lifting is done by the NLE.
               | We go a step ahead and make it embeddable in a larger
               | workflow.
        
         | artpar wrote:
         | I think that goes with almost every tool you want to use with
         | llm. User should already know the tool ideally so mistakes by
         | llm can be prevented before they happen.
         | 
         | Here making ffmpeg as "just another capability" allows it to be
         | stitched together in workflows
        
         | jack_pp wrote:
         | I agree, I suggest using this instead :
         | https://github.com/kkroening/ffmpeg-python . While not perfect
         | once you figure it out it is far easier to use and you can wrap
         | more complicated workflows and reuse them later.
        
           | poly2it wrote:
           | Kkroening's wrapper has been inactive for some time. I
           | suggest using https://github.com/jonghwanhyeon/python-ffmpeg
           | instead. It has proper async support and a better API.
        
             | jack_pp wrote:
             | Thing is, if you want to use LLMs for mockups you got to
             | use the old one.
        
             | jack_pp wrote:
             | Scratch that I thought it was a different version. The one
             | you linked has no support for filtergraphs so isn't even
             | comparable to the old one.
        
         | esperent wrote:
         | The syntax isn't too bad. The problem is that I have to use it
         | a couple of times a year, on average. So every time I've
         | forgotten and have to relearn. This doesn't happen with GUIs
         | nearly as much, and it's why I prefer them over CLI tools for
         | anything that I don't do at least once every week or two.
        
           | skydhash wrote:
           | That's why you write scripts, or put a couple snippets in
           | your notes.
        
             | esperent wrote:
             | I do have snippets in my notes. The problem is that nearly
             | every time I use it, I need to do something different than
             | the previous time.
        
         | Sean-Der wrote:
         | My question/curiosity is why do so many people use ffmpeg
         | (frustrated by the syntax) when GStreamer is available?
         | 
         | `gst-launch-1.0 filesrc ! qt4demux ! matroskamux ! filesink...`
         | people would be less frustrated maybe?
         | 
         | People would also learn a little more and be less frustrated
         | when conversation about container/codec/colorspace etc... come
         | up. Each have a dedicated element and you can better understand
         | its I/O
        
           | artpar wrote:
           | I did not know gstreamer wasm also exists, I'll check it out
        
             | goeiedaggoeie wrote:
             | Still has a way to go, but very exciting.
        
           | throwaway2046 wrote:
           | I haven't tried GStreamer but I found FFmpeg to be extremely
           | easy to compile as both a command line tool and library, plus
           | it can do so much out of the box even without external
           | libraries being present. It's already used in pretty much
           | everything and does the job so it never occurred to me (or
           | others) to look for alternatives.
        
         | javier2 wrote:
         | ffmpeg is pretty complicated, but at least it actually works.
        
         | somat wrote:
         | The thing that helped me get over that ffmpeg bump, where you
         | go from copying stack overflow answers to actually sort of
         | understanding what you are doing is the fairly recent include
         | external file syntax. On the surface it is such a minor thing,
         | but mentally it let me turn what was a confusing mess into a
         | programing language. There are a couple ways to evoke it but
         | the one I used was to load the whole file as an arg. Note the
         | slash, it is important "-/filter_complex filter_file"
         | 
         | https://ffmpeg.org/ffmpeg-filters.html#toc-Filtergraph-synta...
         | 
         | "A special syntax implemented in the ffmpeg CLI tool allows
         | loading option values from files. This is done be prepending a
         | slash '/' to the option name, then the supplied value is
         | interpreted as a path from which the actual value is loaded."
         | 
         | For how critical that was to getting over my ffmpeg hump, I
         | wish it was not buried halfway through the documentation, but
         | also, I don't know where else it would go.
         | 
         | And just because I am very proud of my accomplishment here is
         | the ffmpeg side of my project, motion detection using mainly
         | ffmpeg, there is some python glue logic to watch stdout for the
         | events but all the tricky bits are internal to ffmpeg.
         | 
         | The filter(comments are added for audience understanding):
         | [0:v]         split  #split the camera feed into two parts,
         | passthrough and motion             [vis],         scale=
         | #scale the motion feed way down, less cpu and it works better
         | w=iw/4:             h=-1,         format= #needed because blend
         | did not work as expected with yuv             gbrp,
         | tmix= #temporial blur to reduce artifacts             frames=2,
         | [1:v]  #the mask frame         blend= #mask the motion feed
         | all_mode=darken,         tblend= #motion detect actual, the
         | difference from the last frame             all_mode=difference,
         | boxblur= #blur the hell out of it to increase the number of
         | motion pixels             lr=20,         maskfun= #mask it to
         | black and white             low=3:             high=3,
         | negate, #make the motion pixels black         blackframe= #puts
         | events on stdout when too many black pixels are found
         | amount=1             [motion]; #motion output         [vis]
         | tpad= #delay pass through so you get the start of the event
         | when notified             start=30             [original];
         | #passthrough output
         | 
         | and the ffmpeg evocation:                   ff_args = [
         | 'ffmpeg',           '-nostats',           '-an',
         | '-i',           camera_loc, #a security camera           '-i',
         | 'zone_all.png', # mask as to which parts are relavent for
         | motion detection           '-/filter_complex',
         | 'motion_display.filter', #the filter doing all the work
         | '-map',  #sort out the outputs from the filter
         | '[original]',           '-f',           'mpegts', #I feel a
         | little weied using mpegts but it was the best "streaming" of
         | all the formats I tried           'udp://127.0.0.1:8888',
         | #collect the full video from here           '-map',
         | '[motion]',           '-f',           'mpegts',
         | 'udp:127.0.0.1:8889', #collect the motion output from here,
         | mainly for debugging           ]
        
       | jack_pp wrote:
       | As someone who has used ffmpeg for 10+ years maintaining a
       | relatively complex backend service that's basically a JSON to
       | ffmpeg translator I did not fully understand this article.
       | 
       | Like the Before vs after section doesn't even seem to create the
       | same thing, the before has no speedup, the after does.
       | 
       | In the end it seems they basically created a few services
       | ("recipes") that they can reuse to do simple stuff like speed-up
       | 2x or combine audio / video or whatever
        
         | shardullavekar wrote:
         | thanks for calling it out, I will correct the before vs after
         | section. But you can describe any ffmpeg capability in plain
         | English and the underlying ffmpeg tool call takes care of it.
        
           | jack_pp wrote:
           | I have written a lot of ffmpeg-python and plain ffmpeg
           | commands using LLMs and while I am amazed at how good Gemini
           | or chatGPT can handle ffmpeg prompts it is still not 100% so
           | this seems to me like a big gamble on your part. However it
           | might work for most users that only ask for simple things.
        
             | shardullavekar wrote:
             | so creators on 100x will create well defined workflows that
             | others can reuse. If a workflow is not found, llm creates
             | one on the go and saves it.
        
               | jack_pp wrote:
               | That sounds good, save the LLM generated workflows and
               | have them edited by more seasoned users.
               | 
               | Or you could go one step further and create a special
               | workflow which would allow you to define some inputs and
               | iterate with an LLM until the user gets what he wants but
               | for this you would need to generate outputs and have the
               | user validate what the LLM has created before finally
               | saving the recipe.
        
               | shardullavekar wrote:
               | That's exactly how it is implemented!
        
       | IsTom wrote:
       | > Half of scripting FFmpeg is just fighting with shell quote
       | escaping for filter_complex.
       | 
       | -filter_complex_script is a thing
        
       | 4gotunameagain wrote:
       | This is yc propping up a startup they have backed, there isn't
       | much substance here.
        
       | coachgodzup wrote:
       | I considered FFmpeg a great project, but I usually avoid to use
       | it directly because of his quite complex syntax. I'm
       | reconsidering it because coupled with an llm is very
       | straightforward and more immediate than an usual graphical editor
        
         | orbital-decay wrote:
         | At some point command line becomes unwieldy. FFmpeg would
         | definitely benefit from a non-arcane DSL like AviSynth or a
         | node-based UI.
        
       | skeeter2020 wrote:
       | This doesn't make any sense; the Before and After examples
       | accomplish different things. I also don't get who the target
       | audience is; people intimidated by a CLI tool but at home with
       | technical agents?
        
         | shardullavekar wrote:
         | people intimidated by a CLI tool but find tools like chatgpt
         | easy to use and those who have video editing as a part of
         | larger workflow.
        
       | sanjit wrote:
       | An aside but related?
       | 
       | FFmpeg has complex syntax because it's dealing with the
       | _complexity of video_. I agree with everyone about knowing (and
       | helping create or contribute to) our tools.
       | 
       | Today I largely forget about the _legacy_ of video, the technical
       | challenges, and how critical it was to get it right.
       | 
       | There are an incredible number of output formats and
       | considerations for _current_ screens (desktop, tablet, mobile,
       | tv, etc...). Then we have a whole other world on the creation
       | side for capture, edit, live broadcast...
       | 
       | On legacy formats it used to be so complex with standards,
       | requirements, and evolving formats. Today, we don't even think
       | about why we have 29.97fps around? Interlacing?
       | 
       | We have a mix of so many incredible (and sometimes frustrating)
       | codecs, needs and final outputs, so it's really amazing the power
       | we have with a tool like FFmpeg... It's daunting but really well
       | thought out.
       | 
       | So just a big thanks to the FFmpeg team for all their incredible
       | work over the years...
        
         | shardullavekar wrote:
         | no 2nd thoughts about it, we are only making ffmpeg more
         | accessible and embeddable.
        
         | echelon wrote:
         | > FFmpeg has complex syntax because it's dealing with the
         | _complexity of video_.
         | 
         | It's dealing with 3D data (more if you count audio or other
         | tracks) and multi-dimensional transforms from a command line.
        
         | charcircuit wrote:
         | >FFmpeg has complex syntax because it's dealing with the
         | _complexity of video_
         | 
         | It's complexity paired with bad design, making the situation
         | worse than it could be.
        
           | SpaceManNabs wrote:
           | I refuse to admit that ffmpeg is bad design until I see a
           | better one. so if you have a better one I am all ears because
           | it would surely be very illuminating.
        
       | kwanbix wrote:
       | I use ChatGPT for this kind of complexity.
       | 
       | It works 99% of the time for my use case.
        
         | shardullavekar wrote:
         | jack_pp made a point in the comments, worth noting.
        
       | Dachande663 wrote:
       | ffmpeg is the only community where I've asked for help and been
       | told "if you have to ask, you're too stupid to use this project".
       | Needless to say, it was a welcoming community I continued
       | engaging with.
        
         | pinter69 wrote:
         | People in the community can be hardcore there sometimes,
         | r/ffmpeg especially. But, there are communities online and
         | information resources that help.
         | 
         | This is a nice resource:
         | https://amiaopensource.github.io/ffmprovisr/
         | 
         | And also I've written this cheatsheet, which is designed to be
         | used alongside an LLM: https://github.com/rendi-api/ffmpeg-
         | cheatsheet
         | 
         | Let me know if you're interested in more resources
        
       | oldgregg wrote:
       | AI is game changer for the wildly detailed ffmpeg command line--
       | just tell gpt what you want to do and it will spit out the ffmpeg
       | command 10/10.
        
       | officeplant wrote:
       | FFmpeg continues to be the great filter of those that don't RTFM.
        
         | tartoran wrote:
         | Not really, LLMs get it quite right.
        
       | javier2 wrote:
       | ffmpeg is awful, except for all the other tools that are awfuller
       | and does not even work
        
       | usrxcghghj wrote:
       | Read the entire landing page. Still do not understand 100x bot is
       | ?
        
       | arjie wrote:
       | I just tell Claude Code what I want to do and that it has
       | imagemagick and ffmpeg available and it does all the work for me.
       | Because it's got an agentic flow, it loops around, checks the
       | output and fixes things up.
       | 
       | I can ask it to orient people the right way, crop to the
       | important parts, etc. and it will figure out what "the right
       | way", "the important parts", etc. are. Sometimes I have to give
       | it some light hints like "extract n frames from before y to
       | figure out things", but most of the time it just does it.
       | 
       | Claude Code acts like a very general purpose agent for me. About
       | the one thing that I have to manually do that I'm annoyed by is
       | editing 360 videos into a flow. I'd like to be able to tell
       | Claude Code to "follow my daughter as I dunk her in the pool" and
       | stuff like that but I have to do that myself in the GoPro editor.
        
       ___________________________________________________________________
       (page generated 2025-11-04 23:00 UTC)