[HN Gopher] Linux kernel VP9 codec V4L2 control interface
       ___________________________________________________________________
        
       Linux kernel VP9 codec V4L2 control interface
        
       Author : mfilion
       Score  : 77 points
       Date   : 2021-09-14 13:54 UTC (9 hours ago)
        
 (HTM) web link (lkml.iu.edu)
 (TXT) w3m dump (lkml.iu.edu)
        
       | CameronNemo wrote:
       | Unfortunately userspace (e.g. VAAPI, ffmpeg) support for this is
       | not done. Until VAAPI support is implemented, videos in Firefox
       | will be unaccelerated. I think it is the same deal for Chrome.
        
         | miduil wrote:
         | Wouldn't the gstreamer support that is mentioned by the path-
         | description directly enable hardware acceleration in Firefox?
         | Or do I misunderstand to what extend Firefox is using gstreamer
         | at the moment?
        
           | CameronNemo wrote:
           | Firefox does not use gstreamer at all AFAIK.
        
             | miduil wrote:
             | Ah I confused it with ffmpeg which is also vaapi of course.
             | 
             | Also gstreamer somehow was added for something but that was
             | 7 years ago, I guess getting video decoding running was
             | different story
             | 
             | https://wiki.mozilla.org/index.php?title=Special:Search&lim
             | i...
        
         | fragileone wrote:
         | HW acceleration on Linux was fixed about a year ago
         | https://9to5linux.com/firefox-81-enters-beta-gpu-acceleratio...
        
           | [deleted]
        
           | CameronNemo wrote:
           | Firefox uses VA-API. That library does not support this
           | hardware.
           | 
           | Edit: as explained below, the linked work is for specific ARM
           | hardware like the rk3399 SoC.
        
         | zajio1am wrote:
         | > Until VAAPI support is implemented, videos in Firefox will be
         | unaccelerated. I think it is the same deal for Chrome.
         | 
         | That is an issue of Firefox, other software (mplayer, mpv)
         | support VAAPI for many years. And with youtube-dl integration
         | in mpv, why even play videos in Firefox?
        
           | CameronNemo wrote:
           | Nope not an issue in Firefox, an issue in VAAPI. Firefox
           | supports VAAPI just fine, VAAPI does not support this
           | hardware/API. Considering it is a new API, I am still holding
           | out hope that support gets added.
        
             | megous wrote:
             | https://github.com/noneucat/libva-v4l2-request#branch=fix-
             | ke...
             | 
             | And there's probably some branch somewhere that supports
             | VP9 too.
        
         | fguerraz wrote:
         | The usefulness of hardware acceleration for video decoding is
         | highly debatable.
         | 
         | 1) It's not always much more energy efficient, but it sometimes
         | is, but less than you'd think, GPUs need power too
         | 
         | 2) It increases greatly the complexity of client software that
         | has to implement both accelerated and unaccelerated decoding,
         | leading to poorer software quality
         | 
         | 3) Drivers quality is usually terrible: lists of working
         | hardware/software combinations have to be maintained and in
         | some cases, holes in sandboxes have to be punched [1]
         | 
         | 4) HW support usually lags behind state of the art encoding.
         | Youtube is already using av1, but the vast majority of devices
         | won't support it in hardware before something else comes up
         | 
         | 5) Highly optimised decoders, such as dav1d, are extremely
         | effective and save bandwidth and power compared to HW VP9.
         | 
         | EDIT: I'm mostly talking about the desktop/laptop use case here
         | were things are very fragmented. On a mobile phone where
         | manufacturers control hardware and software end to end, that's
         | a different story.
         | 
         | [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1698778
        
           | FpUser wrote:
           | >"2) It increases greatly the complexity of client software
           | that has to implement both accelerated and unaccelerated
           | decoding, leading to poorer software quality"
           | 
           | I happen to have my own product having just that - software
           | and hw accelerated decoding. It plays videos in few
           | resolutions and presence of HW acceleration allowed me to
           | play 4K videos (first on the market in my segment) with close
           | to 0% CPU consumption on low end PCs. Competitors at that
           | stage would not even dream about offering 4K content.
           | 
           | As to "poorer software quality" - please do not play FUD. I
           | just looked at the source code - the HW accelerated path
           | (decodes from source to DirectX texture) added miniscule 1200
           | lines of code good chunk of which are headers / declarations.
           | The software is being used by tens of thousands of clients
           | and I have about zero reports where enabling HW decoding has
           | lead to error.
        
           | zajio1am wrote:
           | > The usefulness of hardware acceleration for video decoding
           | is highly debatable.
           | 
           | Disagree. On low-end hardware the advantages are clear. On my
           | older Intel NUC i can play 1080p H.264 (using mpv) hw-
           | accelerated with 15% cpu load, or software decoded with 75%
           | cpu load. In the first case the NUC is silent, in the second
           | case core temperature is rising and eventually its fan starts
           | spinning.
        
             | antisthenes wrote:
             | > On my older Intel NUC i can play 1080p H.264 (using mpv)
             | hw-accelerated with 15% cpu load, or software decoded with
             | 75% cpu load
             | 
             | These numbers are meaningless without measuring watt-hours
             | used for the task.
             | 
             | I was able to play 1080p H.264 video with hardware
             | acceleration on a 8800 GS with an Athlon X2 5000 with about
             | the same CPU utilization, back in 2008-2009. There was a
             | special library (shareware) that enabled HW acceleration
             | way before it was commonplace on integrated GPUs. Forgot
             | what it was called, but it was Nvidia/CUDA only.
             | 
             | That was 12+ years ago.
             | 
             | Obviously GPUs have become more efficient since then, but
             | so have the CPUs. It also matters how the video stream was
             | encoded for efficiency. It's entirely possible that under
             | certain options, hardware decoding's advantages are almost
             | entirely negated.
        
               | kimixa wrote:
               | Also there's "levels" of hardware acceleration - using
               | CUDA (or any other shader-level acceleration) will always
               | be less efficient than a dedicated hardware block.
               | 
               | And there's multiple steps in decoding a video - some
               | steps in some codecs may fit different acceleration
               | schemes better, so it may not be worth the hardware cost
               | for a full pipeline decode at some point, but then later
               | transistors are cheaper, or new hw decode techniques
               | discovered, so more steps can be done in dedicated
               | hardware blocks. Also those hardware blocks may have hard
               | limits - if it can only (say) cope with 1080p60 at a
               | certain profile level for a codec, trying to do something
               | more than that will likely just completely skip the HW
               | block - it's hard to do any kind of "hybrid" decode if
               | it's not a whole pipeline step.
               | 
               | "HW Video Decode Acceleration" isn't a simple boolean.
        
           | kllrnohj wrote:
           | > The usefulness of hardware acceleration for video decoding
           | is highly debatable.
           | 
           | No it isn't. There's a reason it's used on 99% of consumer
           | devices. Hardware companies are generally not in the business
           | of adding to the BOM cost for no reason. Linux alone is the
           | outlier.
           | 
           | > It's not always much more energy efficient, but it
           | sometimes is, but less than you'd think, GPUs need power too
           | 
           | "As you can see a GPU enabled VLC is 70% more energy
           | efficient than using the CPU!"
           | 
           | https://devblogs.microsoft.com/sustainable-software/vlc-
           | ener...
           | 
           | chrome-hw showing 1/4th the power consumption of chrome-sw on
           | the same video on more recent Apple M1:
           | https://singhkays.com/blog/apple-silicon-m1-video-power-
           | cons...
           | 
           | Also hardware decoders have consistent performance, which is
           | not true of CPU-based decoders. This is especially
           | problematic & obvious at high resolutions. Windows & MacOS
           | ultrabooks can do 4k video all day long without an issue.
           | Linux ultrabooks get noticeably choppy at 1440p and 4k is
           | right out.
           | 
           | This is also why you'll find ultra-low end SoCs regularly
           | prioritizing hardware decoders over faster CPUs, notably
           | those in every smart TV & the majority of TV streaming
           | dongles/sticks/boxes. Which really shouldn't be surprising,
           | fixed-function hardware has _always_ been drastically more
           | efficient than programmable hardware, and video has changed
           | nothing about that.
           | 
           | > 2) It increases greatly the complexity of client software
           | that has to implement both accelerated and unaccelerated
           | decoding, leading to poorer software quality
           | 
           | Sounds like a job for a library, which is how every other OS
           | makes this a non-issue.
           | 
           | > 4) HW support usually lags behind state of the art
           | encoding. Youtube is already using av1, but the vast majority
           | of devices won't support it in hardware before something else
           | comes up
           | 
           | Youtube also still uses VP9 so that power efficiency didn't
           | regress on existing hardware, and mid-tier TV SoCs with AV1
           | decoder support are already here (such as the Amlogic
           | S905X4). Sony's 2021 BRAVIA XR line also has HW AV1 decoders
           | up to 4k.
           | 
           | > 5) Highly optimised decoders, such as dav1d, are extremely
           | effective and save bandwidth and power compared to HW VP9.
           | 
           | Care to back that up with a source? All I can find is
           | statements that dav1d decoders are fast, but I can't find any
           | evidence they are efficient. The only thing I can find is
           | this: https://visionular.com/en/av1-encoder-optimization-
           | from-the-...
           | 
           | which has dav1d using more power than ffmpeg-h264 but less
           | than openhevc, but those are also software decoders which
           | similar to the above take _significantly_ more power than
           | hardware decoders for the same codecs.
        
             | [deleted]
        
           | tau255 wrote:
           | Disagree.
           | 
           | I can run multiple 1080p twitch streams with mpv using
           | streamlink and setting appropriate decoder flags while using
           | chromium to watch even one stream puts a lot of strain on my
           | laptop and gets fan running immediately.
           | 
           | So from my perspective it is very usefull to offload video
           | decoding to gpu and leave cpu cycles for other work. Is it
           | more energy efficient? Never checked that but gpu fan does
           | not really spin any faster and looking at the temperature
           | graphs it does not seem it really strains it.
           | 
           | I tried enabling gpu acceleration for browser (chromium
           | based) and I still don't really know why it is so flaky and
           | unreliable.
        
           | pantalaimon wrote:
           | > 2) It increases greatly the complexity of client software
           | that has to implement both accelerated and unaccelerated
           | decoding, leading to poorer software quality
           | 
           | Only if you are not using any abstraction layers. GStreamer
           | should take care of using a hardware decoder if available,
           | otherwise fall back to software decoding.
        
           | brigade wrote:
           | Hybrid decoders that use GPU shaders are somewhat rare; HW
           | decoding pretty much always means "ASIC". And ASIC power draw
           | for decoders is typically in the <1W range.
           | 
           | For dav1d, even YouTube-tier 1080p SW decoding is using +4-5W
           | on my laptop, and 4k60 is +15-20W.
        
             | fguerraz wrote:
             | Yes, again, I'm talking about PCs here, where it's usually
             | implemented in shaders.
        
               | CameronNemo wrote:
               | But the email is about ARM SoCs with dedicated VPU IP
               | blocks.
        
               | kllrnohj wrote:
               | No it isn't. "NVDEC" is an actual ASIC block in the GPU
               | silicon. It's not "shaders". Same with AMD's VCN. And
               | Intel's QuickSync.
               | 
               | If it was just shaders then there'd be basically no
               | concerns with driver quality or hardware support, just
               | like there aren't with CPU decoders.
        
               | brigade wrote:
               | So was I? Which phone can even achieve a 20W power
               | draw...
               | 
               | The only hybrid VP9 decoders were AMD's that only
               | supported Windows, which they stopped shipping years ago
               | (any current/Linux AMD drivers that support VP9 decoding
               | only do so via an ASIC), and Intel's that was only
               | supported on 3 generations of GPUs (Gen7.5, Gen8, and
               | Gen9) and is obsoleted with an ASIC in Gen9.5.
        
             | AshamedCaptain wrote:
             | > ASIC power draw for decoders is typically in the <1W
             | range.
             | 
             | Many times even "standalone" HW decoders use or share GPU
             | components (e.g., almost always the memory). Just bumping
             | the memory controller clock up of the GPU already consumes
             | >10W on my system.
        
             | Arnavion wrote:
             | >HW decoding pretty much always means "ASIC"
             | 
             | Indeed. For example, hardware decoding is the difference
             | between choppy video and smooth video on the PinePhone
             | because the CPU isn't powerful enough and the GPU is
             | useless for decoding.
             | 
             | (And to fguerraz's edit that their comment doesn't apply to
             | mobile phones "where manufacturers control hardware and
             | software end to end", the manufacturer does not control the
             | software on the PinePhone.)
        
           | [deleted]
        
           | megous wrote:
           | This is kernel API for VPUs not for GPUs.
           | 
           | Power reduction is not really questionable. You can't really
           | achieve smooth playback at full-res without VPU on devices
           | where these things are used.
        
       | qwerty456127 wrote:
       | I just wonder how does presence of a niche codec in the kernel
       | affect the kernel size and performance.
       | 
       | I would be glad if everybody would use it so it would be
       | mainstream but the reality is H.264 and H.265.
        
         | rjsw wrote:
         | This is a hardware driver that conforms to a standard
         | interface, it doesn't implement a VP9 codec in software in the
         | kernel.
        
         | marcodiego wrote:
         | This is a driver interface. If it is not standardized, we get
         | that ugly situation where an userspace app only works with
         | hardware from a specific vendor.
        
         | dcgudeman wrote:
         | youtube uses VP9 so I wouldn't call it niche.
        
           | _joel wrote:
           | Yes, unless you force it to h264 it'll default to vp9, as,
           | well, google.
        
             | qwerty456127 wrote:
             | Many videos are not available in VP9. I have noticed a
             | couple of years ago when I had to use vanilla Ubuntu
             | without "install 3-rd party software" checkbox checked
             | during installation - Firefox refused to play many YouTube
             | videos.
             | 
             | It also supposedly makes sense to force H.264 to increase
             | chances of hardware acceleration being used.
        
       | rememberlenny wrote:
       | Could someone explain the significance of this, why it took so
       | long, and what it opens open?
        
         | rkangel wrote:
         | VP9 is an open source, royalty free video codec. It is
         | developed by Google to provide a free alternative to things
         | like H264 and H265. Implementing codecs for both of those
         | require paying licence fees.
         | 
         | Codecs can be in software, but can also be implemented in
         | hardware which is much more power efficient. This change
         | enables the Linux kernel to use hardware VP9 decoders so that
         | software can decode (play) VP9 video much more efficiently when
         | that hardware is available.
        
           | [deleted]
        
         | maggit wrote:
         | It looks like it implements hardware acceleration of the VP9
         | codec for some specific hardware (Rockchip VDEC and Hantro G2).
         | This opens up playing, for example, lots of YouTube videos with
         | less CPU usage on devices with that hardware. I can't comment
         | on whether or not it "took so long" as I have no idea which
         | hardware this is.
         | 
         | The title makes it out to be something fundamental in Linux,
         | but this is just one driver becoming more complete.
        
           | megous wrote:
           | It's a new media subsystem userspace API, not just a new
           | driver, and the API will be stable from the get go, instead
           | of languishing in the staging area, like the H.264 one.
        
             | CameronNemo wrote:
             | Is the h264 one going to move out of staging anytime soon?
        
       | marcodiego wrote:
       | AFAIK rk3399 is especial in this area: its codecs need no binary
       | blobs. This means it can encourage other vendors to do the same
       | and get ryf-certified. ARM SBCs based on rk3399 can become the
       | only modern affordable systems with ryf certification.
        
         | rjsw wrote:
         | I don't think the Allwinner codecs need binary blobs.
        
         | Teknoman117 wrote:
         | Now if only the standard release for PineBook Pro would use
         | newer than a 5.7 kernel so we could get hardware codecs.
         | 
         | Might just have to sit down and figure out how to cross compile
         | Gentoo for it.
        
           | marcodiego wrote:
           | Try armbian
        
           | yjftsjthsd-h wrote:
           | You don't need to go full Gentoo to install a custom kernel.
           | From Manjaro you could even pull the linux-mainline AUR
           | package and just build+install that (or linux-git or any of
           | the others), if you want the easy way out.
        
             | Teknoman117 wrote:
             | It was more of a "there are a ton of packages which also
             | need to be patched" kind of thing as well.
        
           | CameronNemo wrote:
           | I mean compiling your own kernel and compiling the OS are
           | very different. I am running 5.14 on my PBP right now.
           | 
           | No support for external displays, though.
        
       ___________________________________________________________________
       (page generated 2021-09-14 23:01 UTC)