[HN Gopher] Chaining FFmpeg with a Browser Agent
___________________________________________________________________
Chaining FFmpeg with a Browser Agent
Author : shardullavekar
Score : 85 points
Date : 2025-11-04 12:52 UTC (10 hours ago)
(HTM) web link (100x.bot)
(TXT) w3m dump (100x.bot)
| sylware wrote:
| HTML <video> or <audio> element with "Streaming" URLs passed to
| the media player (or internally in the web browser for the big
| ones).
| utopiah wrote:
| Have to admit, _ffmpeg_ syntax is not trivial... but also the
| project is 24 years old and is basically the defacto industry
| standard. If you believe you will still be editing videos in 20
| years with the CLI (or any other tool or any programming
| language) wrapping it then it 's probably worth few hours
| learning how it actually works.
| shardullavekar wrote:
| true, companies like Descript, Veed, or Kapwing exist because
| no coders find this syntax intimidating. Plus, a CLI tool
| stands out of a workflow. We wanted to change that.
| petetnt wrote:
| Don't "no coders" find the concepts described in this article
| imdimitating?
|
| The article states that whatever the article is trying to
| describe "Takes about ~20-30 mins. The cognitive load is
| high....". while their literal actual step of "Googling
| "ffmpeg combine static image and audio."" gives you the
| literal command you need to run from a known source
| (superuser.com sourced from ffmpeg wiki).
|
| Anyone even slightly familiar with ffmpeg should be able to
| produce the same result in minutes. For someone who doesn't
| understand what ffmpeg is the article means absolutely
| nothing. How does a "no coder" understand what a "agent in a
| sandboxed container" is?
| shardullavekar wrote:
| we took a basic example and described it. (will try adding
| a complex one)
|
| we have our designer/intern in our minds who creates
| shorts, adds subtiles, crops them,and merges the audio
| generated. He is aware of ffmpeg and prefers using a SaaS
| UI on top of it.
|
| However, we see him hanging out on chatgpt, or gemini all
| the time. He is literally the no coder we have in mind.
|
| We just combined his type what you want + ffmpeg workflows.
| EraYaN wrote:
| Wouldn't that intern just use an NLE (be it Premiere,
| Davinci Resole etc) anyway? If you need to style
| subtitles and edit shorts and video content, you'll need
| a proper editor anyway.
| shardullavekar wrote:
| 1. download a larger video from s3. 2. Use NLE and cut it
| into shorts. (crop, resize, subtitles etc.) 3. Upload
| shorts on YouTube, Instagram, Tiktok.
|
| He does use davinci resolve but only for 2.
|
| NLEs make ffmpeg a standalone yet easy to use tool.
|
| Not denying that major heavy lifting is done by the NLE.
| We go a step ahead and make it embeddable in a larger
| workflow.
| artpar wrote:
| I think that goes with almost every tool you want to use with
| llm. User should already know the tool ideally so mistakes by
| llm can be prevented before they happen.
|
| Here making ffmpeg as "just another capability" allows it to be
| stitched together in workflows
| jack_pp wrote:
| I agree, I suggest using this instead :
| https://github.com/kkroening/ffmpeg-python . While not perfect
| once you figure it out it is far easier to use and you can wrap
| more complicated workflows and reuse them later.
| poly2it wrote:
| Kkroening's wrapper has been inactive for some time. I
| suggest using https://github.com/jonghwanhyeon/python-ffmpeg
| instead. It has proper async support and a better API.
| jack_pp wrote:
| Thing is, if you want to use LLMs for mockups you got to
| use the old one.
| jack_pp wrote:
| Scratch that I thought it was a different version. The one
| you linked has no support for filtergraphs so isn't even
| comparable to the old one.
| esperent wrote:
| The syntax isn't too bad. The problem is that I have to use it
| a couple of times a year, on average. So every time I've
| forgotten and have to relearn. This doesn't happen with GUIs
| nearly as much, and it's why I prefer them over CLI tools for
| anything that I don't do at least once every week or two.
| skydhash wrote:
| That's why you write scripts, or put a couple snippets in
| your notes.
| esperent wrote:
| I do have snippets in my notes. The problem is that nearly
| every time I use it, I need to do something different than
| the previous time.
| Sean-Der wrote:
| My question/curiosity is why do so many people use ffmpeg
| (frustrated by the syntax) when GStreamer is available?
|
| `gst-launch-1.0 filesrc ! qt4demux ! matroskamux ! filesink...`
| people would be less frustrated maybe?
|
| People would also learn a little more and be less frustrated
| when conversation about container/codec/colorspace etc... come
| up. Each have a dedicated element and you can better understand
| its I/O
| artpar wrote:
| I did not know gstreamer wasm also exists, I'll check it out
| goeiedaggoeie wrote:
| Still has a way to go, but very exciting.
| throwaway2046 wrote:
| I haven't tried GStreamer but I found FFmpeg to be extremely
| easy to compile as both a command line tool and library, plus
| it can do so much out of the box even without external
| libraries being present. It's already used in pretty much
| everything and does the job so it never occurred to me (or
| others) to look for alternatives.
| javier2 wrote:
| ffmpeg is pretty complicated, but at least it actually works.
| somat wrote:
| The thing that helped me get over that ffmpeg bump, where you
| go from copying stack overflow answers to actually sort of
| understanding what you are doing is the fairly recent include
| external file syntax. On the surface it is such a minor thing,
| but mentally it let me turn what was a confusing mess into a
| programing language. There are a couple ways to evoke it but
| the one I used was to load the whole file as an arg. Note the
| slash, it is important "-/filter_complex filter_file"
|
| https://ffmpeg.org/ffmpeg-filters.html#toc-Filtergraph-synta...
|
| "A special syntax implemented in the ffmpeg CLI tool allows
| loading option values from files. This is done be prepending a
| slash '/' to the option name, then the supplied value is
| interpreted as a path from which the actual value is loaded."
|
| For how critical that was to getting over my ffmpeg hump, I
| wish it was not buried halfway through the documentation, but
| also, I don't know where else it would go.
|
| And just because I am very proud of my accomplishment here is
| the ffmpeg side of my project, motion detection using mainly
| ffmpeg, there is some python glue logic to watch stdout for the
| events but all the tricky bits are internal to ffmpeg.
|
| The filter(comments are added for audience understanding):
| [0:v] split #split the camera feed into two parts,
| passthrough and motion [vis], scale=
| #scale the motion feed way down, less cpu and it works better
| w=iw/4: h=-1, format= #needed because blend
| did not work as expected with yuv gbrp,
| tmix= #temporial blur to reduce artifacts frames=2,
| [1:v] #the mask frame blend= #mask the motion feed
| all_mode=darken, tblend= #motion detect actual, the
| difference from the last frame all_mode=difference,
| boxblur= #blur the hell out of it to increase the number of
| motion pixels lr=20, maskfun= #mask it to
| black and white low=3: high=3,
| negate, #make the motion pixels black blackframe= #puts
| events on stdout when too many black pixels are found
| amount=1 [motion]; #motion output [vis]
| tpad= #delay pass through so you get the start of the event
| when notified start=30 [original];
| #passthrough output
|
| and the ffmpeg evocation: ff_args = [
| 'ffmpeg', '-nostats', '-an',
| '-i', camera_loc, #a security camera '-i',
| 'zone_all.png', # mask as to which parts are relavent for
| motion detection '-/filter_complex',
| 'motion_display.filter', #the filter doing all the work
| '-map', #sort out the outputs from the filter
| '[original]', '-f', 'mpegts', #I feel a
| little weied using mpegts but it was the best "streaming" of
| all the formats I tried 'udp://127.0.0.1:8888',
| #collect the full video from here '-map',
| '[motion]', '-f', 'mpegts',
| 'udp:127.0.0.1:8889', #collect the motion output from here,
| mainly for debugging ]
| jack_pp wrote:
| As someone who has used ffmpeg for 10+ years maintaining a
| relatively complex backend service that's basically a JSON to
| ffmpeg translator I did not fully understand this article.
|
| Like the Before vs after section doesn't even seem to create the
| same thing, the before has no speedup, the after does.
|
| In the end it seems they basically created a few services
| ("recipes") that they can reuse to do simple stuff like speed-up
| 2x or combine audio / video or whatever
| shardullavekar wrote:
| thanks for calling it out, I will correct the before vs after
| section. But you can describe any ffmpeg capability in plain
| English and the underlying ffmpeg tool call takes care of it.
| jack_pp wrote:
| I have written a lot of ffmpeg-python and plain ffmpeg
| commands using LLMs and while I am amazed at how good Gemini
| or chatGPT can handle ffmpeg prompts it is still not 100% so
| this seems to me like a big gamble on your part. However it
| might work for most users that only ask for simple things.
| shardullavekar wrote:
| so creators on 100x will create well defined workflows that
| others can reuse. If a workflow is not found, llm creates
| one on the go and saves it.
| jack_pp wrote:
| That sounds good, save the LLM generated workflows and
| have them edited by more seasoned users.
|
| Or you could go one step further and create a special
| workflow which would allow you to define some inputs and
| iterate with an LLM until the user gets what he wants but
| for this you would need to generate outputs and have the
| user validate what the LLM has created before finally
| saving the recipe.
| shardullavekar wrote:
| That's exactly how it is implemented!
| IsTom wrote:
| > Half of scripting FFmpeg is just fighting with shell quote
| escaping for filter_complex.
|
| -filter_complex_script is a thing
| 4gotunameagain wrote:
| This is yc propping up a startup they have backed, there isn't
| much substance here.
| coachgodzup wrote:
| I considered FFmpeg a great project, but I usually avoid to use
| it directly because of his quite complex syntax. I'm
| reconsidering it because coupled with an llm is very
| straightforward and more immediate than an usual graphical editor
| orbital-decay wrote:
| At some point command line becomes unwieldy. FFmpeg would
| definitely benefit from a non-arcane DSL like AviSynth or a
| node-based UI.
| skeeter2020 wrote:
| This doesn't make any sense; the Before and After examples
| accomplish different things. I also don't get who the target
| audience is; people intimidated by a CLI tool but at home with
| technical agents?
| shardullavekar wrote:
| people intimidated by a CLI tool but find tools like chatgpt
| easy to use and those who have video editing as a part of
| larger workflow.
| sanjit wrote:
| An aside but related?
|
| FFmpeg has complex syntax because it's dealing with the
| _complexity of video_. I agree with everyone about knowing (and
| helping create or contribute to) our tools.
|
| Today I largely forget about the _legacy_ of video, the technical
| challenges, and how critical it was to get it right.
|
| There are an incredible number of output formats and
| considerations for _current_ screens (desktop, tablet, mobile,
| tv, etc...). Then we have a whole other world on the creation
| side for capture, edit, live broadcast...
|
| On legacy formats it used to be so complex with standards,
| requirements, and evolving formats. Today, we don't even think
| about why we have 29.97fps around? Interlacing?
|
| We have a mix of so many incredible (and sometimes frustrating)
| codecs, needs and final outputs, so it's really amazing the power
| we have with a tool like FFmpeg... It's daunting but really well
| thought out.
|
| So just a big thanks to the FFmpeg team for all their incredible
| work over the years...
| shardullavekar wrote:
| no 2nd thoughts about it, we are only making ffmpeg more
| accessible and embeddable.
| echelon wrote:
| > FFmpeg has complex syntax because it's dealing with the
| _complexity of video_.
|
| It's dealing with 3D data (more if you count audio or other
| tracks) and multi-dimensional transforms from a command line.
| charcircuit wrote:
| >FFmpeg has complex syntax because it's dealing with the
| _complexity of video_
|
| It's complexity paired with bad design, making the situation
| worse than it could be.
| SpaceManNabs wrote:
| I refuse to admit that ffmpeg is bad design until I see a
| better one. so if you have a better one I am all ears because
| it would surely be very illuminating.
| kwanbix wrote:
| I use ChatGPT for this kind of complexity.
|
| It works 99% of the time for my use case.
| shardullavekar wrote:
| jack_pp made a point in the comments, worth noting.
| Dachande663 wrote:
| ffmpeg is the only community where I've asked for help and been
| told "if you have to ask, you're too stupid to use this project".
| Needless to say, it was a welcoming community I continued
| engaging with.
| pinter69 wrote:
| People in the community can be hardcore there sometimes,
| r/ffmpeg especially. But, there are communities online and
| information resources that help.
|
| This is a nice resource:
| https://amiaopensource.github.io/ffmprovisr/
|
| And also I've written this cheatsheet, which is designed to be
| used alongside an LLM: https://github.com/rendi-api/ffmpeg-
| cheatsheet
|
| Let me know if you're interested in more resources
| oldgregg wrote:
| AI is game changer for the wildly detailed ffmpeg command line--
| just tell gpt what you want to do and it will spit out the ffmpeg
| command 10/10.
| officeplant wrote:
| FFmpeg continues to be the great filter of those that don't RTFM.
| tartoran wrote:
| Not really, LLMs get it quite right.
| javier2 wrote:
| ffmpeg is awful, except for all the other tools that are awfuller
| and does not even work
| usrxcghghj wrote:
| Read the entire landing page. Still do not understand 100x bot is
| ?
| arjie wrote:
| I just tell Claude Code what I want to do and that it has
| imagemagick and ffmpeg available and it does all the work for me.
| Because it's got an agentic flow, it loops around, checks the
| output and fixes things up.
|
| I can ask it to orient people the right way, crop to the
| important parts, etc. and it will figure out what "the right
| way", "the important parts", etc. are. Sometimes I have to give
| it some light hints like "extract n frames from before y to
| figure out things", but most of the time it just does it.
|
| Claude Code acts like a very general purpose agent for me. About
| the one thing that I have to manually do that I'm annoyed by is
| editing 360 videos into a flow. I'd like to be able to tell
| Claude Code to "follow my daughter as I dunk her in the pool" and
| stuff like that but I have to do that myself in the GoPro editor.
___________________________________________________________________
(page generated 2025-11-04 23:00 UTC)