[HN Gopher] Decoding AVIF: Deep dive with cats and imgproxy
___________________________________________________________________
Decoding AVIF: Deep dive with cats and imgproxy
Author : progapandist
Score : 33 points
Date : 2021-08-15 11:27 UTC (1 days ago)
(HTM) web link (evilmartians.com)
(TXT) w3m dump (evilmartians.com)
| SilverRed wrote:
| I had a really good go at reading this and trying to understand
| it but I feel I don't understand a whole lot more about image
| decoding/encoding than before I started. Like I get the core
| concepts of key frames, motion vectors and such on a high level
| but if you asked me to actually create a decoder I wouldn't have
| a clue where to start.
|
| I feel like I would need a full hour long video on each paragraph
| of this post to really understand it.
| dylan604 wrote:
| "Commonly, three numbers are used to specify downsampling:
| the first is always 4, don't ask me why"
|
| Wow. Someone going to this much detail on explaining how a
| video/image codec works, and cannot bother learning what the
| numbers of chroma subsampling mean?
|
| The first number represents the luminance.[0] Even if they know
| the first number represents luminance, the "don't ask me why" is
| just horrible on its own. The detail in the image is preserved
| through the luminance channel. The subsampling in the chroma is
| much less perceptable to humans, but more more noticeable in the
| luminance. Therefore, some very smart people learned to cheat the
| data saved for chroma, but not the luminance. "don't ask me why"
| in detailed write ups is just bad in so many ways.
|
| [0]https://en.wikipedia.org/wiki/Chroma_subsampling
| jaffathecake wrote:
| Not sure that explains why the first number has to be 4, which
| was their point.
| dylan604 wrote:
| Then they did not look/research very hard. See my response
| above. I provide a link to someone else's blog that was
| easily found with a DDG search.
| [deleted]
| vlovich123 wrote:
| I don't think you're being generous with the author's
| statement, especially since this is in the section within which
| he's describing chroma subsampling. The author is stating "We
| use 4 as a convention. why is that the convention? No one
| really knows". That seems accurate to me. Do you have a clearer
| answer? Your Wikipedia link doesn't provide any enlightment
| AFAICT, although maybe I missed explanation?
| dylan604 wrote:
| Just before this section the author discusses how the image
| is broken down into blocks. This section is where the
| definition of the 4s could have come from, but they left out,
| for brevity's sake I'm assuming, how those blocks are shaped.
|
| "Now, let's break it down the differences between 4:4:4;
| 4:2:2 and 4:2:0:
|
| The number of pixels that share color is determined by what
| type of chroma subsampling it is. Each sample is defined by a
| block of 8 pixels. The first number refers to the size of the
| sample and its pattern, which is typically 4 pixels wide. The
| second number refers to how many pixels in the top row will
| receive color or chroma sampling. The third number shows how
| many pixels on the bottom row will receive chroma samples"[0]
|
| The block sizes and sub-sampling methods are also why there
| are warnings issued when trying to scale an image when the
| dimensions are not divisible by the block sizes. If you try
| to scale to an odd number, then the sampling within the
| blocks is broken. If you scale to a number not divisible
| evenly by the largest block sizes requested, then you also
| get issues.
|
| [0] https://blog.westpennwire.com/what-is-chroma-subsampling
| LeoPanthera wrote:
| You have not explained why the first number is always 4. (In
| fact, it's not always 4, it just usually is.)
| HALtheWise wrote:
| One thing I never understood is why _downsampling_ is the most
| efficient way to compress the data about chroma into fewer bits
| while maximizing perceptual accuracy. It really seems like for
| any given target bitrate for the chroma data, there should
| always be a more efficient compression scheme available than
| simply throwing out 3/4 of the pixels and running compression
| algorithms on the rest. Surely modern compression can do better
| with a continuous low pass filter or a adaptive compression
| scheme that focuses data on interesting edges or something?
| Maybe someone here can better explain the intuition for this.
| I'm similarly curious for resolution in general (i.e. why does
| 480p upsampled ever look better than 1080p at the same bitrate)
| but chroma seems like a good place to start.
| Scaevolus wrote:
| JPEG XL doesn't perform chroma subsampling in its native
| color space of XYB. https://cloudinary.com/blog/how_jpeg_xl_c
| ompares_to_other_im...
| dylan604 wrote:
| >Surely modern compression can do better
|
| I "surely" look forward to your Show HN write up on your new
| compression algorithm. We've been iteratively getting better
| at compression for some time now. It seems like everytime it
| looks like we've wrung every bit out of DCT, someone comes up
| with some a little more clever. Wavelets looked promising,
| but never took off.
|
| >why does 480p upsampled ever look better than 1080p at the
| same bitrate
|
| That's a very vague question. Are you stating that you think
| 480p upsampled to 1080p at 1.5Mbps looks better than a source
| at 1080p at 1.5Mbps? I have a hard time believing this to be
| true.
|
| To understand why the chroma is sub-sampled and not the
| luminance has to do with how the cones/rods in the eyes work.
| There's a lot of things you can get away with (or trick if
| you will) the brain in what it is seeing. Is it better to
| lose half the height or half the width? Is it better loose
| more red than green or blue?
| cycomanic wrote:
| Just after he says: "4:2:0 is the most popular case. Four luma
| samples per one chroma" so he does understand and write what it
| means. That does not explain why the value is 4.
|
| I assume it is because with 4 you have 3 different subsampling
| ratios (if you want to keep factors of two, which you typically
| want to keep algorithms simple)
| keithwinstein wrote:
| Poynton has a pretty plausible-sounding explanation here
| (https://poynton.ca/PDFs/Chroma_subsampling_notation.pdf):
|
| "The commonly used leading digit of 4 is a historical reference
| to a sample rate roughly four times the NTSC or PAL color
| subcarrier frequency; the notation originated when subcarrier-
| locked sampling was under discussion for component video. Upon
| the adoption of component video sampling at 13.5 MHz, the first
| digit came to specify luma sample rate relative to 3 3/8 MHz.
| HDTV was once supposed to be described as 22:11:11! Since then,
| the leading digit has - thank-fully - come to be relative to
| the sample rate in use. Until recently, the initial digit was
| always 4, since all chroma ratios have been powers of two - 4,
| 2, or 1. However, 3:1:1 subsampling has been commercialized in
| an HDTV production system (Sony's HDCAM), so 3 may now appear
| as the leading digit. By convention, a leading digit of 2 is
| never used."
| zinekeller wrote:
| Okay, the _real_ reason, as far as the bundles of paper I have*
| is accurate, is that digital chroma subsampling was first
| invented for MUSE, a Japanese analogue HD video standard (with
| pre-broadcast digital components). They chose four for
| horizontal because it 's relatively easy to manipulate using
| their digital systems at the time and two for vertical so that
| it's easy to handle interlacing stuff. Unfortunately, I'm not
| Sony or NHK so I can't say for certain why not eight or any
| other powers of two. Also, Americans (aka the SMPTE) set the
| 1,080 lines (the Japanese standard is 1,025), the 16:9
| compromise (between the European and Japanese 15:9 and cinema
| 21:9) and the "limited RGB" dilemma that is experienced in
| digital video systems (that's literally from the days of NTSC
| signalling!). Both the Japanese NHK/Sony MUSE system and the
| British IBA (adopted as European) D-MAC system uses the full-
| range 8-bit system that is used for JPEG (pre-broadcast to
| analogue, of course).
|
| Analogous to this, the reason why CD audio is 44,100 Hz is
| because that's the commonality between NTSC (System M, 525-line
| 480-visible 60-Hz) and 625-line (576-visible 50-Hz) systems.
| Digital audio was literally stored on U-matic systems at the
| time, and it was initially only 14-bit PCM rather than the
| 16-bit PCM of CDs.
|
| * or rather, my employer's mini-library.
___________________________________________________________________
(page generated 2021-08-16 23:00 UTC)