https://lynne.ee/vulkan-video-decoding.html

  * Home
  * About
  * Archive

Lynne's compiled musings

https://github.com/cyanreg https://pars.ee/lynne  



 



IRC: Lynne

22-12-15

Vulkan Video decoding

All video acceleration APIs came in three flavours.

  * System-specific
      + DXVA(2), DirectX 12 Video, MediaCodec
      + VAAPI, VDPAU, XvMC, XvBA, YAMI, V4L2, OMX
      + etc..
  * Vendor-specific
      + Quick Sync, MFX
      + Avivo, AMF
      + Crystal HD
      + CUVID, NVDEC, NVENC
      + RKMPP
      + Others I'm sure I've forgotten...
  * System AND vendor specific
      + Videotoolbox

All of those APIs come with quirks. Some insist that you can only use
up to 31 frames, which is problematic for complex transcoding chains.
Some insist that you preallocate all frames during initialization,
which is memory-inefficient. Most require your H264 stream to be in
Annex-B for no good reason. Some even give you VP9 packets in some
unholy unstandardized Annex-B. Some require that you give them raw
NALs, others want out-of-band NALs and slices, and some do both.

If you wanted to do processing on hardware video frames, your options
were limited. Most of the APIs let you export frames to OpenGL for
presentation. Some of the more benevolent APIs let you import frames
back from OpenGL or DRM. A few of them also let you do OpenCL
processing.

And of course, all of this happened with little to no
synchronization. Artifacts like video tearing, block decoding not
quite being finished, missing references are commonplace even
nowadays. Most APIs being stateful made compensating for missing
references or damaged files difficult.

Finally, attempting to standardize this jungle is Vulkan video.
Rather than a compromise, it is low-level enough to describe most
quirks of video acceleration silicon, and with a single codepath, let
you decode and encode video with relative statelessness.

Implementation-wise, so far, there had only been a single example,
the vk_video_samples repository. As far as example code goes, I
wouldn't recommend it. Moreover, it uses a closed source parsing
library.

I wrote and maintain the Vulkan code in FFmpeg, so it fell on me to
integrate video decoding and encoding. At the same time, Dave Airlie
started writing a RADV (Mesa's Vulkan code for AMD chips)
implementation. With his invaluable help, in a few weeks, minus some
months of inactivity, we have working and debuggable open-source
driver implementation, and a clean and performant API user code.

Technical aspects

The biggest difference between Vulkan video and other APIs is that
you have to manage memory yourself, specifically the reference frame
buffer. Vulkan calls it the Decoded Picture Buffer (DPB), which is a
rather MPEG-ese term, but fair enough. There are three possible
configurations of the DPB:

  * Previous output pictures are usable as references. ^1

  * Centralized DPB pool consisting of multiple images. ^2

  * Centralized DPB pool consisting of a single image with multiple
    layers. ^3

In the first case, you do not have to allocate any extra memory, but
merely keep references of older frames. FFmpeg's hwaccel framework
does this already.
Intel's video decoding hardware supports this behavior.

In the second case, for each output image you create, you have to
allocate an extra image from a pool with a specific image usage flag.
You give both the output, the output's separate reference DPB image,
and all previous reference DPB images, and the driver then writes to
your output frame, while simultaneously also writing to the DPB
reference image for the current frame.
Recent AMD (Navi21+) and most Nvidia hardware support this mode.

In the third case, the situation is identical to the second case,
only that you have to create a single image upfront with as many
layers as there are maximum references. Then, when creating a
VkImageView, you specify which layer you need based on the DPB slot.
This is a problematic mode, as you have to allocate all references
you need upfront, even if they're never used. Which, for 8k HEVC
video, is around 3.2 gigabytes of Video RAM.
Older AMD hardware requires this.

Another difference with regards to buffer management is that unlike
other APIs which all managed their own references, with Vulkan, you
have to know which slot in the DPB each reference belongs to. For
H264, this is simply the picture index. For HEVC, after considerable
trial and error, we found to be the index of the frame in the DPB
array. 'slot' is not a standard term in video decoding, but in lieu
of anything better, it's appropriate.

Apart from this, the rest is mostly standard. Like NVDEC, VDPAU,
DXVA, slice decoding is, sadly, not supported, which means you have
to concatenate the data for each slice in a buffer, with start codes
^4, then upload the data to a VkBuffer to decode from. Somewhat of an
issue with very high bitrate video, but at least Vulkan lets you have
spare and host buffers to work around this.

Unlike other decoding APIs, which let you only set a few SPS, PPS
(and VPS in HEVC) fields, you have to parse and set practically every
single field from those bitstream elements. For HEVC alone, the total
maximum possible struct size for all fields is 114 megabytes, which
means you really ought to pool the structure memory and expand it
when necessary, because although it's unlikely that you will get a
stream using all possible values, anyone can craft one and either
corrupt your output or crash your decoder.

Vulkan video requires that multiplane YUV images are used. Multiplane
images are rather limiting, as they're not well-supported, and if
you'd like to use them to do processing, you have to use DISJOINT
images with an EXTENDED creation flag (to be able to create
VkImageViews with STORAGE usage flags), which are even less supported
and quirky. Originally, the FFmpeg Vulkan code relied entirely on
emulating multiplane images by using separate images per-plane. To
work Vulkan video into this, I initially wrote some complicated
ALIASing code to alias the memory from the separate VkImages to the
multiplane VkImage necessary for decoding. This eventually got messy
enough to make me give up on the idea, and port the entire code to
allow for first-class multiplane support. What would've helped
would've been some foreknowledge of the drafting process, but lacking
this, as well as any involvement in the standardization, refactoring
is necessary.

Code

As of 2022-12-19, the code has not yet been merged into mainline
FFmpeg. My branch can be found here. There is still more refactoring
necessary to make multiplane images first-class, which would be good
enough to merge, but for now, it's able to decode both H264 and HEVC
video streams in 8-bit and 10-bit form.

To compile, clone and checkout the vulkan_decode branch:

git clone -b vulkan_decode https://github.com/cyanreg/FFmpeg

To configure, use this line:

./configure --disable-doc --disable-shared --enable-static --disable-ffplay --disable-ffprobe --enable-vulkan

Then type make -j0 to compile.

To run,

./ffmpeg_g -init_hw_device "vulkan=vk:0,debug=1" -hwaccel vulkan -hwaccel_output_format vulkan -i <INPUT_FILE> -loglevel debug -filter_hw_device vk -vf hwdownload,format=nv12 -c:v rawvideo -an -y OUT.nut

The validation layers are turned on via the debug=1 option.
To decode 10-bit content, you must replace format=nv12 with format=
p010,format=yuv420p10.
To use a different Vulkan device, replace vulkan=vk:0 with vulkan=vk:
<N>, where <N> is the device index you'd like to use.

This will produce a OUT.nut file containing the uncompressed decoded
data. You can play this using ffplay, mpv or VLC. Alternatively,
there are many resources on how to use the FFmpeg CLI and output
whatever format you'd like.

Non-subsampled 444 decoding is possible, provided drivers enable
support for it.

Driver support

Currently, as of 2022-12-19, there are 3 drivers supporting Vulkan
Video.

  * RADV
  * ANV
  * Nvidia Vulkan Beta drivers

For RADV, Dave Airlie's radv-vulkan-video-prelim-decode branch is
necessary.
RADV has full support for Vulkan decoding - 8-bit H264, 8-bit and
10-bit HEVC. The output is spec-compliant.
For installation instructions, check out Dave's blog.

For ANV, his anv-vulkan-video-prelim-decode branch is needed instead.
ANV has partial support for H264 - certain streams may cause crashes.
For installation instructions, check out Dave's blog.

For Nvidia, the Vulkan Beta drivers are necessary. Only Linux has
been tested.
The drivers produce compliant 8-bit HEVC decoding output with my
code. 10-bit HEVC decoding produces slight artifacts. 8-bit H264
decoding is broken. Nvidia are looking into the issues, progress can
be tracked on the issue thread I made.

State/Future

Currently, I'm working with Dave Airlie on video encoding, which
involves getting a usable implementation and drivers ready. The plan
is to finish video encoding before merging the entire branch into
mainline FFmpeg.
The encoding extension in Vulkan is very low level, but extremely
flexible, which is unlike all other APIs that force you onto fixed
coding paths and often bad rate control systems.
With good user-level code, even suboptimal hardware implementations
could be made competitive with fast software implementations. The
current code ignores the driver's rate control modes, and will
integrate with Daala/rav1e's RC system.

Due to multiplane surfaces being needed for Vulkan encoding and
decoding, Niklas Haas is working on integrating support for them in
libplacebo, which would enable post-processing of decoded Vulkan
frames in FFmpeg, and enable both mpv and VLC to display the decoded
data directly.

In the near future, support for more codecs will hopefully
materialize.

 1. When the driver sets VkVideoDecodeCapabilitiesKHR.flags =
    VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR. -
 2. When the driver sets VkVideoDecodeCapabilitiesKHR.flags =
    VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR. -
 3. When the driver sets VkVideoDecodeCapabilitiesKHR.flags =
    VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR and
    does NOT set VkVideoCapabilitiesKHR.flags =
    VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR. -
 4. { 0x0, 0x0, 0x1 }, sigh, MPEG-TS's curse never ends. -

video  *  vulkan  *  CC-BY logo