[HN Gopher] FFmpeg lands CLI multi-threading as its "most comple...
___________________________________________________________________
FFmpeg lands CLI multi-threading as its "most complex refactoring"
in decades
Author : worble
Score : 429 points
Date : 2023-12-12 15:15 UTC (7 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| qrush wrote:
| Is there a recording of this talk from VDD@Dublin? Can't easily
| find it on the author's site or here
| https://www.videolan.org/videolan/events/vdd23/
|
| Update: Found here!
| https://www.youtube.com/watch?v=Z4DS3jiZhfo&t=1221s
| motoboi wrote:
| It's nuts to think that in the near future LLM will be able to do
| that refactoring in seconds. All we need is enough context
| window.
| bsdpufferfish wrote:
| Why would you think this is possible?
| motoboi wrote:
| based on my current experience with gpt-4. Have you tried
| some sort of refactoring in it? Because I have been routinely
| turning serial scripts into parallel ones with success.
|
| Couldn't do the same with larger codebases because the
| context is not enough for the input code and output
| refactoring.
| thfuran wrote:
| Yeah, it's nuts to think that.
| motoboi wrote:
| I'm genuinely confused about your point of view. Have you
| tried refactoring with GPT-4?
|
| I have been refactoring code using gpt-4 for some months now
| and the limiting factor have been the context size.
|
| GPT-4 turbo now have 128k context and I can provide it with
| larger portions of the code base for the refactors.
|
| When we have millions of tokens of context, based on what I'm
| experiencing now, I can see that a refactoring like the one
| made in ffmpeg would be possible. Or not? What am I missing
| here?
| rocqua wrote:
| Refactoring is really rather well defined. It's " just
| transformations that are invariant w.r.t. the outcome". The
| reason they are hard to automate is that 'invariant w.r.t.
| the outcome' is a lot more lenient than most semantic models
| van handle. But this kind of well-defined task with a slight
| amount of nuance (and decently checkable) seems pretty well-
| suited to an LLM.
| nolist_policy wrote:
| At least for the linux kernel, qemu and other large c
| projects, this is a solved problem with coccinelle[1].
| Compared to AI, it has the added benefit of not doing
| incorrect changes and/or hallucinating stuff or promt
| injections or ...
|
| I guess you could use AI to help create a coccinelle
| semantic patch.
|
| [1] https://en.wikipedia.org/wiki/Coccinelle_(software)
| dataangel wrote:
| The part coccinelle does is the part GPT is good at, the
| problem is neither of them actually reason about the code
| bigbillheck wrote:
| Why on earth would you possibly think that?
| ctoth wrote:
| Just as your human intelligence lead to you writing the same
| darn comment as another human above you, AI can often write
| the same code as a human would, without having to even bring
| creativity into it! For those of us who write code, this can
| be useful!
| motoboi wrote:
| I'm quite confused by the answers I got from this thread.
| Haven't you tried refactoring with gpt-4 yet?
| bigbillheck wrote:
| > Haven't you tried refactoring with gpt-4 yet?
|
| I most certainly have not. At work, I do greenfield
| development in a specialized problem domain, and I would
| not trust a model (or, for that matter, a junior developer)
| to do any kind of refactor in an acceptable manner. (That
| aside, there's no way I'm goingto expose company code to
| any sort of outside party without the approval of upper
| management).
|
| At home, I program for fun and self-improvement, and a big
| part of both is thinking hard about problems. Why would I
| want to wreck that with asking a model to do it for me?
| motoboi wrote:
| Oh, I understand.
|
| What if you have an idea and you tell the computer to
| implement it and then thoroughly check the code?
|
| About the self-improvement part, I suppose you don't
| operate your graphics card yourself, but delegate it to
| your driver.
|
| The LLM is just another tool.
| smabie wrote:
| Oh yea your work is far too sophisticated for a llm, got
| it
| bigbillheck wrote:
| Some of us do actual creative work, yes.
| airstrike wrote:
| Some people get significance from their ability to write
| code. To them, admitting an LLM can (or will soon be able
| to) do their work inflicts cognitive dissonance, so they
| refuse to believe it. Some refuse to even try it--not
| realizing that refusing to engage does nothing to hinder
| the advancement of the tool they fear.
| mrintegrity wrote:
| Parent post is getting down voted to oblivion but it seems a
| reasonable belief for someone who is not highly engaged with
| AI. I have only the vaguest understanding of how it works (and
| it's probably wrong) and to my layman mind it also seems like a
| totally fair assumption, based on experience as a user and the
| constant flood of news. Please explain why the suggestion that
| a future AI / sufficiently advanced LLM could refactor a
| complex codebase is so preposterous.
| not_the_fda wrote:
| Because refactoring requires understanding, which LLMs
| completely lack.
| ctoth wrote:
| > Because refactoring requires understanding, which LLMs
| completely lack.
|
| Interesting.
|
| Here's a sample question I just asked the AI:
|
| We are working on a React mapping application. I am going
| to give you two components from this application. Our goal
| is to extract functionality to deduplicate code
|
| We can make a single base component, extract code to hooks,
| or any other strategy which you think is appropriate
|
| Here is travel.tsx:
|
| <snip>
|
| Here is heatmap.tsx:
|
| <snip>
|
| And here is the response:
|
| To deduplicate code and improve maintainability between the
| `TravelMap` and `Heatmap` components in your React mapping
| application, we can identify common functionalities and
| extract them into reusable hooks or components. Here's a
| strategy considering your code:
|
| 1. *Extract Common Hooks*: Both components use similar
| hooks for managing markers, popups, and map references. We
| can create custom hooks that encapsulate these
| functionalities.
|
| 2. *Create Shared Utility Functions*: Functions like
| `initialViewState`, which are identical in both components,
| can be moved to a separate utility file and imported where
| needed.
|
| 3. *Refactor Map Layers and Sources*: The way map layers
| and sources are handled is quite similar in both
| components. A custom hook or component to manage map layers
| and sources would be beneficial.
|
| 4. *Generalize Popup Handling*: The popup handling in both
| components can be abstracted into a custom hook or a
| higher-order component.
|
| 5. *Combine Common Styling*: Shared styles, such as
| `MAP_CONTAINER_STYLE`, can be moved to a common CSS module
| or a styled-component that can be reused.
|
| 6. *Refactor Event Handlers*: Functions like `onMapClick`,
| `onMouseMove`, and `onMouseLeave` have similar structures.
| These can be part of a custom hook or a utility function
| with configurable parameters.
|
| 7. *Legend and Live Region Components*: These components
| are used in a similar way in both `TravelMap` and
| `Heatmap`. Ensure they are designed to be reusable with
| different props.
|
| 8. *Refactor Map Initialization*: The way the map is
| initialized and configured is quite similar. This can be
| abstracted into a custom component or hook.
|
| ### Example Implementations:
|
| #### 1. Custom Hook for Map Markers:
|
| ```typescript // useMapMarkers.ts import { useState,
| useCallback } from 'react';
|
| export const useMapMarkers = (initialLat, initialLng,
| world) => { // Similar marker logic from your components //
| ... return { marker,
| onMarkerDrag, onMarkerDragEnd,
| updateMarkerPosition, // other marker related
| states and functions };
|
| }; ```
|
| #### 2. Shared Utility Function:
|
| ```typescript // utils/mapUtils.ts export const
| initialViewState = (map, mapType) => { // Implementation };
| ```
|
| #### 3. Custom Hook for Map Layers and Sources:
|
| ```typescript // useMapLayers.ts import { useState,
| useEffect } from 'react';
|
| export const useMapLayers = (props) => { // Logic to handle
| layers and sources // ... return {
| dataLayer, patternLayer, // other layer
| related states and functions };
|
| }; ```
|
| #### 4. Generalized Popup Handling:
|
| ```typescript // useMapPopup.ts import { useState } from
| 'react';
|
| export const useMapPopup = () => { // Popup logic from your
| components // ... return {
| popupState, openPopup, closePopup,
| setPopupData, // other popup related states and
| functions };
|
| }; ```
|
| ### Integration:
|
| Once these hooks and utilities are created, integrate them
| into your `TravelMap` and `Heatmap` components. Replace the
| duplicated logic with calls to these shared resources. This
| will significantly reduce code duplication and make future
| maintenance easier.
|
| ### Testing and Refinement:
|
| After refactoring, thoroughly test both components to
| ensure that all functionalities work as expected. Watch out
| for any edge cases or specific behaviors that might be
| affected by the refactoring.
|
| For those suggestions, I might use five out of eight of
| them, and probably do one or two things differently. But
| you cannot, with a straight face, say the model did not
| understand. It clearly did. It suggested reasonable
| refactors. If being able to refactor means understanding, I
| guess we have understanding!
|
| I could continue with this conversation, ask it to produce
| the full code for the hooks (I have in my custom prompt to
| provide outlines) and once the hooks are complete, ask it
| to rewrite the components using the shared code.
|
| Have you ever used one of these models?
| never_inline wrote:
| Eliminating duplication and cleaning code is a different
| type of refactoring than supporting concurrency, which is
| much much harder.
|
| Cleaning up code also follows some well established
| patterns, performance work is much less pattern-y.
|
| Codebases like FFMPEG are one of the kind. I bet you need
| 10 or 100 times more understanding than the react thing
| you mentioned above.
|
| One day maybe AI can do it, but it probably won't be LLM.
| It would be something which can understand symbols and
| math.
| ctoth wrote:
| Ah, we're having some classic goalpost moving!
|
| > Because refactoring requires understanding, which LLMs
| completely lack.
|
| <demonstration that an LLM can refactor code>
|
| > Cleaning up code also follows some well established
| patterns, performance work is much less pattern-y.
|
| Just as writing shitty react apps follow patterns, low-
| level performance and concurrency work also follow
| patterns. See [0] for a sample.
|
| > I bet you need 10 or 100 times more understanding
|
| Okay, so a 10 or 100 times larger model? Sounds like
| something we'll have next year, and certainly within a
| decade.
|
| > One day maybe AI can do it, but it probably won't be
| LLM. It would be something which can understand symbols
| and math.
|
| You do understand that the reason some of the earlier
| GPTs had trouble with symbols and math was the
| tokenization scheme, completely separate from how they
| work in general, right?
|
| [0]: C++ Concurrency in Action: Practical Multithreading
| 1st Edition https://www.amazon.com/C-Concurrency-Action-
| Practical-Multit...
| kcbanner wrote:
| > Because refactoring requires understanding, which LLMs
| completely lack.
|
| It's obvious from context here that the refactoring that
| was mentioned was specifically around concurrency, not
| simply cleaning up code.
| ctoth wrote:
| So if I show you an LLM implementing concurrency, will
| you concede the point? Is this your true objection?
|
| https://chat.openai.com/share/7c41f59a-c21c-4abd-876c-c95
| 647...
| malcolmgreaves wrote:
| Hope you're looking for good-faith discussion here. I'll
| assume that you're looking for a response where someone
| has taken the time to read through your previous messages
| and also the linked ChatGPT interaction logs.
|
| What you've shown is actually a great example of the what
| folks mean that LLMs lack any sort of understanding.
| They're fundamentally predict-the-next-token machines;
| they regurgitate and mix parts of their training data in
| order to satisfy the token prediction loss function they
| were trained with.
|
| In the linked example you provided, *you* are the one
| that needs to provide the understanding. It's a rather
| lengthly back-and-forth to get that code into a somewhat
| useable state. Importantly, if you didn't tell it to fix
| things (sqlite connections over threads, etc.), it would
| have failed.
|
| And while it's concurrent, it's using threads, so it's
| not going to be doing any work in parallel. The example
| you have mixes some IO and compute-bound looking
| operations.
|
| So, if your need was to refactor your original code to
| _actually be fast_, ChatGPT demonstrated it doesn't
| understand nearly enough to actually make this happen.
| This thread conversation got started around correcting
| the misnomer that an LLM would actually ever be able to
| possess enough knowledge to do actually valuable, complex
| refactoring and programming.
|
| While I believe that LLMs can be good tools for a variety
| of usecases, they have to be used in short bursts. Since
| their output is fundamentally unreliable, someone always
| has to read -- then comprehend -- its output. Giving it
| too much context and then prompting it in such a way to
| align its next token prediction with a complex outcome is
| a highly variable and unstable process. If it outputs
| millions of tokens, how is someone going to actually
| review all of this?
|
| In my experience using ChatGPT, GPT4, and a few other
| LLMs, I've found that it's pretty good at coming up with
| little bits to jog one's own thinking and problem
| solving. But doing an actual complex task with lots of
| nuance and semantics-to-be-understood outright? The
| technology is not quite there yet.
| atrus wrote:
| Chess requires understanding, which computers lack. Go
| requires understanding, which computers lack. X requires Y
| which _AI technology today_ lacks. AI is a constantly
| moving goalpost it seems.
| satvikpendem wrote:
| > AI is a constantly moving goalpost it seems.
|
| alwayshasbeen.png
|
| > The AI effect occurs when onlookers discount the
| behavior of an artificial intelligence program by arguing
| that it is not "real" intelligence.[1] > Author Pamela
| McCorduck writes: "It's part of the history of the field
| of artificial intelligence that every time somebody
| figured out how to make a computer do something--play
| good checkers, solve simple but relatively informal
| problems--there was a chorus of critics to say, 'that's
| not thinking'."[2] Researcher Rodney Brooks complains:
| "Every time we figure out a piece of it, it stops being
| magical; we say, 'Oh, that's just a computation.'"[3]
|
| > "AI is whatever hasn't been done yet."
|
| > --Larry Tesler
|
| https://en.wikipedia.org/wiki/AI_effect
| The_Colonel wrote:
| It was always clear that games like chess or go can be
| played by computers well, even with simple algorithms,
| because they were completely formalized. The only issue
| was with performance / finding more efficient algorithms.
|
| That's very different from code which (perhaps
| surprisingly) isn't well formalized. The goals are often
| vague and it's difficult to figure out what is
| intentional and what incidental behavior (esp. with
| imperative code).
| astrange wrote:
| The ffmpeg tests take a lot more than a few seconds to run,
| and an AI god is still going to have trouble debugging
| multithreaded code.
| dataangel wrote:
| AI is not very good at _single threaded_ code which is widely
| regarded as much easier. The breathless demos don 't
| generalize well when you truly test on data not in the
| training set, it's just that most people don't come up with
| good tests because they take something from the internet,
| which is the training set. But the code most people need to
| write is to do tasks that are bespoke to individual
| businesses/science-experiments/etc not popular CS problems
| that there are 1000 tutorials online for. When you get into
| those areas it becomes apparent really quickly that the AI
| only gets the "vibes" of what code should look like, it
| doesn't have any mechanistic understanding.
| ctoth wrote:
| I see that you got some responses from people who may have not
| even used gpt-4 as a coding assistant, but I absolutely agree
| with you. A larger context window, a framework like Aider, and
| slightly-better tooling so the AI can do renames and other
| high-level actions without having to provide the entire
| changeset as patches, and tests. Lots of tests. Then you can
| just run the migration 15 times, pick from the one which passes
| all the tests... run another integration pass to merge ideas
| from the other runs, rinse and repeat. Of course the outer
| loops will themselves be automated.
|
| The trick to this is continuous iteration and feedback. It's
| remarkable how far I've gotten with GPT using these simple
| primitives and I know I'm not the only one.
| beeboobaa wrote:
| If you think a large refactor is just renaming some stuff
| then it makes sense you think this.
| dataangel wrote:
| If you ask GPT to refactor a single threaded program much
| smaller than the context window that is truly out of sample
| into a multithreaded program, its often going to fail. GPT
| has trouble understanding bit masks in single threaded code,
| let alone multiple threads.
| beeboobaa wrote:
| I have some snake oil to sell you
| PaywallBuster wrote:
| the pdf with a presentation requires a password
| bsdpufferfish wrote:
| If I'm operating a cloud service like Netflix, then I'm already
| running thousands of ffmpeg processes on each machine. In other
| words, it's already a multi-core job.
| platzhirsch wrote:
| As multi-core as Python and Ruby then.
| bsdpufferfish wrote:
| Yes. The kernel multiplies your efforts for you. It works
| great for web services.
| sanitycheck wrote:
| Curious, what would that many ffmpeg processes be doing at
| Netflix? I assume new VOD content gets encoded once per format,
| and the amount of new content added per day is not gigantic.
|
| Agree with the general premise, of course, if I've got 10
| different videos encoding at once then I don't need additional
| efficiency because the CPU's already maxed out.
| asveikau wrote:
| Probably a lot more than once when you consider that
| different devices have different capabilities, and that they
| might stream you different bitrates depending on conditions
| like your network capability, screen resolution, how much
| you've paid them..
|
| You could also imagine they might apply some kind of
| heuristic to decide to re-encode something based on some
| condition... Like fine tune encoder settings when a title
| becomes popular. No idea if they do that, just using some
| imagination.
| The_Colonel wrote:
| I assume they re-compress for each resolution / format, quite
| possibly they also have different bitrate levels per
| resolution. Potentially even variants tweaked for certain
| classes of device (in cases this is not already covered by
| combination of format/resolution/bitrate). I would also
| assume they re-compress with new advances in video processing
| (things like HDR, improved compression).
|
| Also, their devs likely want fast feedback on changes - I
| imagine they might have CI running changes on some standard
| movies, checking various stats (like SNR) for regressions.
| Everybody loves if their CI finishes fast, so you might want
| to compress even a single movie in multiple threads.
| sanitycheck wrote:
| They'll be doing VBR encodes to DASH, HLS & (I guess still)
| MSS which covers the resolutions & formats... DRM will be
| what prevents high res content from working on some "less-
| trusted" platforms so the same encodes should work.
|
| (Plus a couple more "legacy" encodes with PIFF instead of
| CENC for ancient devices, probably.)
|
| New tech advances, sure, they probably do re-encode
| everything sometimes - even knocking a few MB off the size
| of a movie saves a measurable amount of $$ at that scale.
| But are there frequent enough tech advances to do that more
| than a couple of times a year..? The amount of difficult
| testing (every TV model group from the past 10 years, or
| something) required for an encode change is horrible. I'm
| sure they have better automation than anyone else, but I'm
| guessing it's still somewhat of a nightmare.
|
| Youtube, OTOH, I really can imagine having thousands of
| concurrent ffmpeg processes.
| canucker2016 wrote:
| Why bring up assumptions/suppositions about Netflix's
| encoding process?
|
| Their tech blog and tech presentations discuss many of
| the requirements and steps involved for encoding source
| media to stream to all the devices that Netflix supports.
|
| The Netflix tech blog: https://netflixtechblog.com/ or
| https://netflixtechblog.medium.com/
|
| Netflix seems to use AWS CPU+GPU for encoding, whereas
| YouTube has gone to the expense of producing an ASIC to
| do much of their encoding.
|
| 2015 blog entry about their video encoding pipeline:
| https://netflixtechblog.com/high-quality-video-encoding-
| at-s...
|
| 2021 presentation of their media encoding pipeline:
| https://www.infoq.com/presentations/video-encoding-
| netflix/
|
| An example of their FFmpeg usage - a neural-net video
| frame downscaler: https://netflixtechblog.com/for-your-
| eyes-only-improving-net...
|
| Their dynamic optimization encoding framework -
| allocating more bits for complex scenes and fewer bits
| for simpler, quieter scenes:
| https://netflixtechblog.com/dynamic-optimizer-a-
| perceptual-v... and
| https://netflixtechblog.com/optimized-shot-based-encodes-
| now...
|
| Netflix developed an algorithm for determining video
| quality - VMAF, which helps determine their encoding
| decisions: https://netflixtechblog.com/toward-a-
| practical-perceptual-vi...,
| https://netflixtechblog.com/vmaf-the-journey-
| continues-44b51..., https://netflixtechblog.com/toward-a-
| better-quality-metric-f...
| astrange wrote:
| > Their dynamic optimization encoding framework -
| allocating more bits for complex scenes and fewer bits
| for simpler, quieter scenes:
| https://netflixtechblog.com/dynamic-optimizer-a-
| perceptual-v... and
| https://netflixtechblog.com/optimized-shot-based-encodes-
| now...
|
| This is overrated - of course that's how you do it, what
| else would you do?
|
| > Mean-squared-error (MSE), typically used for encoder
| decisions, is a number that doesn't always correlate very
| nicely with human perception.
|
| Academics, the reference MPEG encoder, and old
| proprietary encoder vendors like On2 VP9 did make
| decisions this way because their customers didn't know
| what they wanted. But people who care about quality, i.e.
| anime and movie pirate college students with a lot of
| free time, didn't.
|
| It looks like they've run x264 in an unnatural mode to
| get an improvement here, because the default "constant
| ratefactor" and "psy-rd" always behaved like this.
| slhck wrote:
| > This is overrated - of course that's how you do it,
| what else would you do?
|
| That's not what has been done previously for adaptive
| streaming. I guess you are referring to what encoding
| modes like CRF do for an individual, entire file? Or
| where else has this kind of approach been shown before?
|
| In the early days of streaming you would've done constant
| bitrate for MPEG-TS, even adding zero bytes to pad "easy"
| scenes. Later you'd have selected 2-pass ABR with some
| VBV bitrate constraints to not mess up the decoding
| buffer. At the time, YouTube did something where they
| tried to predict the CRF they'd need to achieve a certain
| (average) bitrate target (can't find the reference
| anymore). With per-title encoding (which was also
| popularized by Netflix) you could change the target
| bitrates for an entire title based on a previous
| complexity analysis. It took quite some time for other
| players in the field to also hop on the per-title
| encoding train.
|
| Going to a per-scene/per-shot level is the novely here,
| and exhaustively finding the best possible combination of
| QP/resolution pairs for an entire encoding ladder that
| also optimizes subjective quality - and not just MSE.
| astrange wrote:
| > exhaustively finding the best possible combination of
| QP/resolution pairs for an entire encoding ladder that
| also optimizes subjective quality - and not just MSE.
|
| This is unnecessary if the encoder is well-written. It's
| like how some people used to run multipass encoders 3 or
| 4 times just in case the result got better. You only need
| one analysis pass to find the optimal quality at a
| bitrate.
| slhck wrote:
| Sure, the whole point of CRF is to set a quality target
| and forget about it, or, with ABR, to be as good as you
| can with an average bitrate target (under constraints).
| But you can't do that across resolutions, e.g. do you
| pick the higher bitrate 360p version, or the lower
| bitrate 480p one, considering both coding artifacts and
| upscaling degradation?
| astrange wrote:
| At those two resolutions you'd pick the higher resolution
| one. I agree that generation of codec doesn't scale all
| the way up to 4K and at that point you might need to make
| some smart decisions.
|
| I think it should be possible to decide in one shot in
| the codec though. My memory is that codecs (image and
| video) have tried implementing scalable resolutions
| before, but it didn't catch on simply because dropping
| resolution is almost never better than dropping bitrate.
| canucker2016 wrote:
| You're letting the video codec make all the decisions for
| bitrate allocation.
|
| Netflix tries to optimize the encoding parameters per
| shot/scene.
|
| from the dynamic optimization article:
|
| - A long video sequence is split in shots ("Shots are
| portions of video with a relatively short duration,
| coming from the same camera under fairly constant
| lighting and environment conditions.")
|
| - Each shot is encoded multiple times with different
| encoding parameters, such as resolutions and qualities
| (QPs)
|
| - Each encode is evaluated using VMAF, which together
| with its bitrate produces an (R,D) point. One can convert
| VMAF quality to distortion using different mappings; we
| tested against the following two, linearly and inversely
| proportional mappings, which give rise to different
| temporal aggregation strategies, discussed in the
| subsequent section
|
| - The convex hull of (R,D) points for each shot is
| calculated. In the following example figures, distortion
| is inverse of (VMAF+1)
|
| - Points from the convex hull, one from each shot, are
| combined to create an encode for the entire video
| sequence by following the constant-slope principle and
| building end-to-end paths in a Trellis
|
| - One produces as many aggregate encodes (final operating
| points) by varying the slope parameter of the R-D curve
| as necessary in order to cover a desired bitrate/quality
| range
|
| - Final result is a complete R-D or rate-quality (R-Q)
| curve for the entire video sequence
| banana_giraffe wrote:
| It's been reported in the past that Netflix encodes 120
| different variants of each video they have [1] for different
| bitrates and different device's needs.
|
| And that was years ago, I wouldn't be surprised to learn it's
| a bigger number now.
|
| [1] https://news.ycombinator.com/item?id=4946275
| The_Colonel wrote:
| I guess it's irrelevant for Netflix then*. But it sounds great
| for the remaining 99.99%.
|
| * I would be very surprised if Netflix even uses vanilla ffmpeg
| bsdpufferfish wrote:
| > But it sounds great for the remaining 99.99%.
|
| I believe the vast majority of ffmpeg usages are web
| services, or one off encodings.
| The_Colonel wrote:
| Well, this feature is awesome for one-off encoding by a
| home user.
|
| Subjectively, me compressing my holiday video is much more
| important than Netflix re-compressing a million of them.
| buu700 wrote:
| I use ffmpeg all the time, so this change is much
| appreciated. Well not really _that_ often, but when I do
| encode video /audio it's generally with ffmpeg.
| kevincox wrote:
| Latency is still valuable. For example YouTube (which IIRC uses
| ffmpeg) often takes hours to do transcodes. This is likely
| somewhat due to scheduling but assuming that they can get the
| same result doing 4x threads for 1/4 of the time they would
| prefer that as each job finishes faster. The only real question
| is at what efficiency cost the latency benefit stops being
| worth it.
| lelanthran wrote:
| I think that if you're operating at the scale of Google using
| a single-threaded ffmpeg will finish your jobs in less time.
|
| If you have a queue of 100k videos to process and a cluster
| of 100 cores, assigning a video to each core as it becomes
| available is the most efficient way to process them, because
| your skipping the thread joining time.
|
| Anytime there is a queue of jobs, assigning the next job in
| the queue to the next free core is always going to be faster
| than assigning the next job to multiple cores.
| Thaxll wrote:
| YouTube does not use ffmpeg, at the scale at which they
| operate it would be too slow / expensive.
|
| They use custom hardware just for encoding.
|
| fyi they have to transcode over 500h of videos per minute. So
| multiple that by all the formats they support.
|
| They operate at an insane scale, Netflix looks like a garage
| project for comparison.
| astrange wrote:
| There's still decoding. If a service claims to support all
| kinds of weird formats (like a MOV or AVI from the 90s)
| that means ffmpeg is running.
| canucker2016 wrote:
| Google's use of ffmpeg:
| https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/
|
| For encoding, recently, they've built their own ASIC to
| deal with H264 and VP9 encoding (for 7-33x faster
| encoding compared to CPU-only):
| https://arstechnica.com/gadgets/2021/04/youtube-is-now-
| build...
| stevehiehn wrote:
| Okaaay, and if I'm not operating a cloud service like Netflix,
| and I'm not running thousands of ffmpeg processes? In other
| words, it's not already a multi-core job?
| brucethemoose2 wrote:
| Meanwhile, I've been enjoying threaded filter processing in
| VapourSynth for nearly a decade.
|
| Not that this isn't great. Its fantastic. But TBH its not really
| going to change my workflow of VapourSynth preprocessing + av1an
| encoding for "quality" video encodes.
| tetris11 wrote:
| I believe I have too with gstreamer's pipe framework for
| threading, but ffmpeg's syntax has stuck in my mind far longer
| than any of the elaborate setups I built with gstreamer. I'm
| excited for this development
| dylan604 wrote:
| FFMPEG does so much more than just video encoding. I use ffmpeg
| all day every day, and only a fraction of the time do I
| actually make a video.
| m3kw9 wrote:
| Like what do you do?
| andoma wrote:
| One can use it instead of cat to display text files. Easy
| syntax to remember. ffmpeg -v quiet -f data
| -i file.txt -map 0:0 -c text -f data -
| DonHopkins wrote:
| Good thing it's now multi-threaded so it can process all
| those command line arguments in parallel!
| whalesalad wrote:
| I'm dying.
| nerpderp82 wrote:
| https://www.youtube.com/watch?v=9kaIXkImCAM
| whalesalad wrote:
| I'm glad we've reached a point where there is quality
| parody content online for our industry.
| ElijahLynn wrote:
| THIS!!! It was so refreshing!
| Rebelgecko wrote:
| Check out Krazam. I quote their Microservices video on a
| regular basis (https://youtu.be/y8OnoxKotPQ)
| danudey wrote:
| "Do you know ffmpeg supports OCR? I haven't found the
| command yet, but it does support it."
|
| This is probably 80% of my experience with ffmpeg, to be
| honest, but the other 20% is invaluable enough anyway.
| ElijahLynn wrote:
| That was one of the funniest things I've seen in a
| while!!!! I had to stop drinking my decaf for fear of
| spitting it all over my computer I was laughing out loud
| so much!
|
| (ps: and no, it's not Rick Astley/Never Gonna Give You
| Up)
| nerpderp82 wrote:
| The artfully inserted, corrupted predicted frames was
| :chefskiss:
| dkjaudyeqooe wrote:
| I bet ffmpeg special cases that combination of flags and
| calls cat.
| yoz wrote:
| can I use ffmpeg to embed a gif in a Hacker News comment,
| because I want that so much right now
| jasomill wrote:
| No, but you can use ffmpeg to create a GIF from ASCII art
| embedded in a Hacker News comment: $
| ffmpeg -v quiet -codecs | egrep -i 'gif|ascii'
| D.V.L. ansi ASCII/ANSI art DEV..S
| gif CompuServe GIF (Graphics Interchange
| Format)
|
| ("D" and "E" in the first field indicate support for
| decoding and encoding)
| faitswulff wrote:
| Ah now I can replace all my useless uses of cat with
| ffmpeg
| fransje26 wrote:
| I use dd for that. dd if=./file.txt
|
| Can you also format your drive with ffmpeg? I'm looking
| for a more versatile dd replacement..
| jasomill wrote:
| It can't create partition tables or filesystems, so no,
| but ffmpeg -f data -i /dev/zero -map 0:0
| -c copy -f data - > /dev/sda
|
| is roughly equivalent to to dd
| status=progress if=/dev/zero of=/dev/sda
| dylan604 wrote:
| you might need a -disposition default type option,
| otherwise, it introduce some abnormal behavior
| jasomill wrote:
| That doesn't work[1], but ffmpeg -v quiet
| -f data -i file.txt -map 0:0 -c copy -f data -
|
| does.
|
| [1] "Encoder 'text' specified, but only '-codec copy'
| supported for data streams"
| polonbike wrote:
| Beside video conversion/compression ? Sound extraction or
| processing, image processing, video casting or streaming,
| anything related to image/multimedia format, basically
| whalesalad wrote:
| you can make gif's with it
| starkparker wrote:
| I've used it for video and audio concatenation of laserdisc
| game segments, transcoding audio clips for gamedev,
| programmatically generating GIFs of automatically generated
| video clips from tests in a CI pipeline, ripping songs and
| audio clips from YouTube videos to ogg/mp3, creating GIFs
| from burst-shot and time-lapse photography (and decimating
| them), excerpting clips from a video without re-encoding,
| and compressing or transforming audio on remote servers
| where VLC wasn't and couldn't be installed.
| bbkane wrote:
| Sounds like you already have a process for most of this,
| but I found https://github.com/mifi/editly to be
| incredibly helpful to run ffmpeg and make my little time
| lapse video. Could be useful for others
| dylan604 wrote:
| ffmpeg can produce an amazing amount of analysis
| ThrowawayTestr wrote:
| I use ffmpeg everytime I download a YouTube video.
| brucethemoose2 wrote:
| Vapoursynth can be used for image processing too (albeit more
| clumsily with variable size input), and its also a great way
| to hook into PyTorch.
| Thaxll wrote:
| Does it can fix broken files?
| PreachSoup wrote:
| Can you run doom on it?
| j1elo wrote:
| Interesting! I'm among today's lucky 10,000 in learning for the
| first time about VapourSynth.
|
| How come it only has 4 measly entries in HN, and none got any
| traction. I've posted a new entry, just for the curiosity of
| others.
| aidenn0 wrote:
| I'm guessing from context that VapourSynth is a frame-server in
| the vein of avisynth? If so, does it run on Linux? Avisynth was
| the single biggest thing I missed when moving to Linux about 20
| years ago.
|
| [edit]
|
| found the docs; it's available on Linux[1]. I'm definitely
| looking into it tonight because it can't be _worse_ than
| writing ffmpeg CLI filtergraphs!
|
| 1: http://www.vapoursynth.com/doc/installation.html#linux-
| insta...
| brucethemoose2 wrote:
| Yep, and its so much better than ffmpeg CLI that its not even
| funny.
|
| This is a pretty good (but not comprehensive) db of the
| filters: https://vsdb.top/
| naikrovek wrote:
| I don't understand why you would want to piggyback on this
| story to say this.
|
| are people just itching for reasons to dive into show & tell or
| to wax poetic about how _they_ have solved the problem for
| _years_? I really don 't understand people at all, because I
| don't understand why people do this. and I'm sure I've done it,
| too.
| tetris11 wrote:
| There is hype for FEAT. People who have achieved similar FEAT
| perk up their heads but say nothing.
|
| Hype for FEAT is beyond sensibility. People with similar FEAT
| are bristled by this and wish that their projects received
| even a fraction of FEAT's hype.
|
| I think it's normal.
| naikrovek wrote:
| not gonna define FEAT, then? ok.
| tetris11 wrote:
| ...in this case, multi-threading. In other cases; AI
| workflows that others commercialize, a new type system in
| a language that already exists in another, a new sampling
| algorithm that has already existed by another name for
| decades, a permaculture innovation that farmers have been
| using for aeons, the list goes on...
| naikrovek wrote:
| just say "feature".
|
| language is for communicating. don't impede that
| communication by using unnecessary terms.
| brucethemoose2 wrote:
| Not gonna lie, I think VapourSynth has been flying under the
| radar for far too long, and is an awesome largely unused
| alternative to ffmpeg filter chains in certain cases. I don't
| see any harm in piggybacking on an ffmpeg story to bring it
| up, especially if readers find it useful.
|
| It's been threaded since its inception, so it seems somewhat
| topical.
| badrabbit wrote:
| When I stream 4k from my laptop ffmpeg gets very intense about
| cpu usage to the point fans are constantly at high speed and it's
| distracting. I hope this helps in some way. I have a fairly
| decent specces mid-tier laptop.
| hereme888 wrote:
| I believe ffmpeg can be compiled to support the GPU, if your
| laptop has one. It works at least for CUDA-enabled GPUs
| (https://docs.nvidia.com/video-technologies/video-codec-
| sdk/1...)
|
| Talk with ChatGPT about it and see if you can do it.
| isatty wrote:
| ???
|
| Just Google it.
| badrabbit wrote:
| Thanks, I don't think it has a gpu separate from that is
| cuda-enabled but I have other systems that do, will look into
| it.
| kevincox wrote:
| I've always wondered if better multi-core performance can come
| from processing different keyframe segments separately.
|
| IIUC all current encoders that support parallelism work by
| multiple threads working on the same frame at the same time.
| Often times the frame is split into regions and each thread
| focuses on a specific region of the frame. This approach can have
| a (usually small) quality/efficiency cost and requires per-
| encoder logic to assemble those regions into a single frame.
|
| What if instead/additionally different keyframe segments are
| processed independently? So if keyframes are every 60 frames
| ffmpeg will read 60 frames pass that to the first thread, the
| next 60 to the next thread, ... then assemble the results
| basically by concatenating them. It seems like this could be used
| to parallelize any codec in a fairly generic way and it should be
| more efficient as there is no thread-communication overhead or
| splitting of the frame into regions which harms cross-region
| compression.
|
| Off the top of my head I can only think of two issues:
|
| 1. Requires loading N*keyframe period frames into memory as well
| as the overhead memory for encoding N frames.
|
| 2. Variable keyframe support would require special support as the
| keyframe splits will need to be identified before passing the
| video to the encoding threads. This may require extra work to be
| performed upfront.
|
| But both of these seem like they won't be an issue in many cases.
| Lots of the time I'd be happy to use tons of RAM and output with
| a fixed keyframe interval.
|
| Probably I would combine this with intra-frame parallelization
| such as process every frame with 4 threads and then run 8
| keyframe segments in parallel. This way I can get really good
| parallelism but only minor quality loss from 4 regions rather
| than splitting the video into 32 regions which would harm quality
| more.
| Hello71 wrote:
| your idea also doesn't work with live streaming, and may also
| not work with inter-frame filters (depending on
| implementation). nonetheless, this exists already with those
| limitations: av1an and I believe vapoursynth work more or less
| the way you describe, except you don't actually need to load
| every chunk into memory, only the current frames. as I
| understand, this isn't a major priority for mainstream encoding
| pipelines because gop/chunk threading isn't massively better
| than intra-frame threading.
| kevincox wrote:
| It can work with live streaming, you just need to add N
| keyframes of latency. With low-latency livestreaming
| keyframes are often close together anyways so adding say 4s
| of latency to get 4x encoding speed may be a good tradeoff.
| bagels wrote:
| 4s of latency is not acceptable for applications like live
| chat
| kevincox wrote:
| As I said, "may be". "Live" varies hugely with different
| use cases. Sporting events are often broadcast live with
| 10s of seconds of latency. But yes, if you are talking to
| a chat in real-time a few seconds can make a huge
| difference.
| mort96 wrote:
| Well, you don't add 4s of latency for 4x encoding speed
| though. You add 4s of latency for very marginal
| quality/efficiency improvement and significant encoder
| simplification, because the baseline is current frame-
| parallel encoders, not sequential encoders.
|
| Plus, computers aren't quad cores any more, people with
| powerful streaming rigs probably have 8 or 16 cores; and
| key frames aren't every second. Suddenly you're in this
| hellish world where you have to balance latency, CPU
| utilization and encoding efficiency. 16 cores at a not-so-
| great 8 seconds of extra latency means terrible efficiency
| with a key frame every 0.5 second. 16 cores at good
| efficiency (say, 4 seconds between key frames) means
| terrible 64 second of extra latency.
| kevincox wrote:
| > except you don't actually need to load every chunk into
| memory, only the current frames.
|
| That's a good point. In the general case of reading from a
| pipe you need to buffer it somewhere. But for file-based
| inputs the buffering concerns aren't relevant, just the
| working memory.
| dbrueck wrote:
| Actually, not only does it work with live streaming, it's not
| an uncommon approach in a number of live streaming
| implementations*. To be clear, I'm not talking about low
| latency stuff like interactive chat, but e.g. live sports.
|
| It's one of several reasons why live streams of this type are
| often 10-30 seconds behind live.
|
| * Of course it also depends on where in the pipeline they
| hook in - some take the feed directly, in which case every
| frame is essentially a key frame.
| cudder wrote:
| I know next to nothing about video encoders, and in my naive
| mind I absolutely thought that parallelism would work just like
| you suggested it should. It sounds absolutely wild to me that
| they're splitting single frames into multiple segments. Merging
| work from different threads for every single frame sounds
| wasteful somehow. But I guess it works, if that's how everybody
| does it. TIL!
| astrange wrote:
| Most people concerned about encoding performance are doing
| livestreaming and so they can't accept any additional
| latency. Splitting a frame into independent segments (called
| "slices") doesn't add latency / can even reduce it, and it
| recovers from data corruption a bit better, so that's usually
| done at the cost of some compression efficiency.
| seeknotfind wrote:
| Video codecs often encode the delta from the previous frame,
| and because this delta is often small, it's efficient to do it
| this way. If each thread needed to process the frame
| separately, you would need to make significant changes to the
| codec, and I hypothesize it would cause the video stream to be
| bigger in size.
| keehun wrote:
| The parent comment referred to "keyframes" instead of just
| "frames". Keyframes--unlike normal frames--encode the full
| image. That is done in case the "delta" you mentioned could
| be dropped in a stream ending up with strange artifacts in
| the resulting video output. Keyframes are where the codec
| gets to press "reset".
| seeknotfind wrote:
| Oh right. For non realtime, if you're not IO bound, this is
| better. Though I'd wonder how portable the codec code
| itself would be.
| actionfromafar wrote:
| The encoder has a lot of freedom in _how_ it arrives at
| the encoded data.
| danielrhodes wrote:
| Isn't that delta partially based on the last keyframe? I
| guess it would be codec dependent, but my understanding is
| that keyframes are like a synchronization mechanism where the
| decoder catches up to where it should be in time.
| astrange wrote:
| In most codecs the entropy coder doesn't reset across
| frames, so there is enough freedom that you can do
| multithreaded decoding. ffmpeg has frame-based and slice-
| based threading for this.
|
| It also has a lossless codec ffv1 where the entropy coder
| doesn't reset, so it truly can't be multithreaded.
| 0x457 wrote:
| Yes, key frames are fully encoded, and some delta frames
| are based on the previous frame (which could be keyframe or
| another delta frame). Some delta frames (b-frames) can be
| based on next frame instead of previous. That's why
| sometimes you could have a visual glitch and mess up the
| image until the next key frame.
|
| I'd assume if each thread is working on its own key frame,
| it would be difficult to make b-frames work? Live content
| also probably makes it hard.
| rokweom wrote:
| There's already software that does this:
| https://github.com/master-of-zen/Av1an Encoding this way should
| indeed improve quality slightly. Whether that is actually
| noticeable/measurable... I'm not sure.
| rnnr wrote:
| ffmpeg and x265 allow you to do this too. frame-threads=1
| will use 1 thread per frame addressing the issue OP
| mentioned, without big perf penalty, in contrary to 'pools'
| switch which sets the threads to be used for encoding.
| jamal-kumar wrote:
| I've messed around with av1an. Keep in mind the software used
| for scene chunking, L-SMASH, is only documented in Japanese
| [1], but it does the trick pretty well as long as you're not
| messing with huge dimensions like HD VR where you have video
| dimensions that do stuff like crash quicktime on a mac
|
| [1] http://l-smash.github.io/l-smash/
| PatronBernard wrote:
| IIUC - International Islamic University Chittagong?
| nolist_policy wrote:
| IIUC - If I understand correctly.
| KineticLensman wrote:
| If I Understand Correctly
| bmurphy1976 wrote:
| This definitely happens. This is how videos uploaded to
| Facebook or YouTube become available so quickly. The video is
| split into chunks based on key frame, the chunks are farmed out
| to a cluster of servers and encoded in parallel, and the
| outputs are then re-assembled into the final file.
| pier25 wrote:
| So does this mean that FFMPEG will be able to use multiple cores
| with all the included codecs?
|
| I'm using FFMPEG to encode MP3 with LAME for an audio hosting
| service and it would be great to improve encode times for long
| files.
| pseudosavant wrote:
| Doubtful. Many codecs like MP3 aren't well suited to efficient
| multi-threaded encoding.
| mrbuttons454 wrote:
| Will this allow multiple machines to process the same file? If
| not is there anything out there that can?
| muragekibicho wrote:
| Shameless plug. I teach the FFmpeg C api here :
| https://www.udemy.com/course/part-1-video-coding-with-ffmpeg...
| 3abiton wrote:
| That's such a very niche topic to teach. What usecases?
| muragekibicho wrote:
| It's for engineers tired of memorizing long weird CLI
| commands. I teach you the underlying C data structures so you
| can get out of command line hell and make the most out of
| your time!
| ElijahLynn wrote:
| This must have been quite the challenge to continually rebase the
| ongoing changes coming in on the daily. Wow. Now that it is
| actually in, it should be much easier to go forward.
|
| Big win too! This is going to really speed things up!
| shp0ngle wrote:
| I don't know anything about ffmpeg codebase, but I just wonder...
| how would I go about doing this _slowly_ without completely doing
| a giant commit that changes everything?
|
| The presentation says it's 700 commits. Was that a separate
| branch? Or was it slowly merged back to the project?
|
| Well I can look at github I guess
| shp0ngle wrote:
| It seems ffmpeg uses the mailing list patch way of doing "PRs",
| which is... well it is what it is. It doesn't help me
| understand the process unless I just go through all the mailing
| list archives, I guess.
| asylteltine wrote:
| Ugh why? That is so old school
| _joel wrote:
| Like the linux kernel?
| shp0ngle wrote:
| I mean they might be used to doing that as ffmpeg is
| predating github. (and git.)
|
| as long as it works for them...
| ajhai wrote:
| This will hopefully improve the startup times for FFmpeg when
| streaming from virtual display buffers. We use FFmpeg in LLMStack
| (low-code framework to build and run LLM agents) to stream
| browser video. We use playwright to automate browser interactions
| and provide that as tool to the LLM. When this tool is invoked,
| we stream the video of these browser interactions with FFmpeg by
| streaming the virtual display buffer the browser is using.
|
| There is a noticeable delay booting up this pipeline for each
| tool invoke right now. We are working on putting in some
| optimizations but improvements in FFmpeg will definitely help.
| https://github.com/trypromptly/LLMStack is the project repo for
| the curious.
| bane wrote:
| A theory about this that may also affect other older solid
| software: the assumptions made on where to optimally "split" a
| problem for multi-threading/processing has likely changed over
| time.
|
| It wasn't that long ago that reading, processing, and rendering
| the contents of a single image took a noticeable amount of time.
| But both hardware and software techniques have gotten
| significantly faster. What may have made sense many years ago
| (lots of workers on a frame) may not matter today when a single
| worker can process a frame or a group of frames more efficiently
| than the overhead of spinning up a bunch of workers to do the
| same task.
|
| But where to move that split now? Ultra-low-end CPUs now ship
| with multiple cores and you can get over 100 easily on high-end
| systems, system RAM is faster than ever, interconnect moves
| almost a TB/sec on consumer hardware, GPUs are in everything, and
| SSDs are now faster than the RAM I grew up with (at least on
| continuous transfer). Basically the systems of today are entirely
| different beasts to the ones commonly on the market when FFmpeg
| was created.
|
| This is tremendous work that requires lots of rethinking about
| how the workload needs to be defined, scheduled, distributed,
| tracked, and merged back into a final output. Kudos to the team
| for being willing to take it on. FFmpeg is one of those "pinnacle
| of open source" infrastructure components that civilizations are
| built from.
| MightyBuzzard wrote:
| It's not the codecs that were multithreaded in this release.
| Pretty much all modern codecs are already multithreaded. What
| they decided to parallelize is ffmpeg itself. You know, the
| filter graphs and such. They didn't do anything to the codecs
| themselves.
| vfclists wrote:
| All without Rust?
| 71a54xd wrote:
| Random reach here but has anyone here managed to get FFMPeg to
| render JS text over a video? I've been thinking about this
| workflow and just haven't quite figured it out yet, only a
| prototype in MoviePy but I'd like to move away from that.
| sylware wrote:
| I think this was not "basic" multi-threading: they were careful
| about keeping latency as low as possible and some internal
| modifications of ffmpeg libs had to be done.
|
| That said, I don't think we still get input buffering (for HLS).
| Const-me wrote:
| Intel Core Duo CPU was released in 2006. By then it was obvious
| computationally intensive programs need multithreading, these
| Unix-style processes are no longer adequate.
|
| I wonder why did it took so long for FFmpeg?
|
| BTW, MS Media foundation is a functional equivalent of FFmpeg. It
| was released as a part of Windows Vista in 2006, and is heavily
| multithreaded by design.
| atif089 wrote:
| Does it mean that my FFMPEG H264 encoding until now was single
| threaded?
___________________________________________________________________
(page generated 2023-12-12 23:00 UTC)