[HN Gopher] The QOI File Format Specification
___________________________________________________________________
The QOI File Format Specification
Author : Aissen
Score : 164 points
Date : 2021-12-20 14:10 UTC (8 hours ago)
(HTM) web link (phoboslab.org)
(TXT) w3m dump (phoboslab.org)
| Aissen wrote:
| The experiment that led to this file format (QOI: Lossless Image
| Compression in O(n) Time) was discussed on HN:
|
| https://news.ycombinator.com/item?id=29328750
| pcwalton wrote:
| Neat, this is impressive!
|
| The first thing that comes to mind is that it's not very amenable
| to parallel encoding or decoding, even less so than PNG. The fact
| that QOI_OP_INDEX could reach arbitrarily far back basically
| eliminates parallel decompression. But I guess the simplest
| workaround for that would be to just break up your images into
| tiles that are each decoded in parallel. You could imagine a
| simple wrapper format that specifies how to reconstruct a larger
| image out of such tiles, which would be amenable to parallelism
| (albeit still not easily implementable on GPU, though I guess you
| could wring a little bit of parallelism out of it from by using
| one GPU SIMD lane per RGBA channel).
|
| Of course, it's entirely possible that your decompression goes so
| fast it doesn't matter that it's sequential and you're
| bottlenecked on I/O in any reasonable use case...
| CodesInChaos wrote:
| The end marker feels awkward:
|
| * It shouldn't be necessary, since having encoded all pixels
| already implies you're at the end.
|
| * It complicates the encoding/decoding of the indexing operation.
| edflsafoiewq wrote:
| It doesn't really complicate the encoder. Any real encoder
| would never emit 7 OP_INDEX to the same index, it would use a
| RLE instead.
| formerly_proven wrote:
| This is very nice.
|
| If I had to nitpick something, I'd say choosing all-big-endian is
| kinda odd in 2021 but it's probably not a biggy even for
| something like this.
| phoboslab wrote:
| Well, it's easier to read in a hex editor and seemed like the
| right thing to do for a "wire format".
|
| The only places where it matters is the header's width/height
| and order of RGB(A) data. Storing RGBA as ABGR is not a great
| choice, imho. If you want to be endian independent, you have to
| read one byte at a time anyway (or swap).
| wheybags wrote:
| Storing rgba as abgr is a _terrible_ choice, but are you sure
| they 're actually doing that?
| phoboslab wrote:
| Sorry, I see how my previous comment was misleading. To
| clarify: No, QOI is not doing that. QOI stores RGBA as
| R,G,B,A because it is big endian.
|
| Little endian would imply that RGBA is stored as A,B,G,R.
| Which was my argument against using LE for a file format.
| Taywee wrote:
| I don't believe it would imply that, because those are
| independent elements. RGBA would still be stored as R, G,
| B, A unless you're considering each pixel as a single
| 32-bit integer. A C structure for the colors would even
| live in memory in the order you expect. Just like a
| string is still stored byte-by-byte even in little-endian
| contexts.
|
| LE vs BE would only affect multi-byte integers and
| floats, not arrays or structures of single-byte elements.
| jandrese wrote:
| ARGB would be nice if you want to integrate with Cairo.
| mgaunard wrote:
| why are channels interleaved at all if speed of decompression
| was the goal?
| user-the-name wrote:
| Because almost everywhere you would use data will expect it
| as interleaved. Separating the channels would require a
| second pass to re-interleave the data.
| gpvos wrote:
| The bytes in the compressed data aren't going to be aligned to
| 2-, 4- or 8-byte offsets, so the overhead is going to be
| nonexistent or negligible.
| thechao wrote:
| Support for 16/32b and fp-formats would've been nice -- the
| compressor is actually orthogonal to the bit width of the
| channels. Custom fp-compression usually just unvolves moving
| the sign bit into the LSB, and delta encoding the mantissa
| separately from the exponent -- although the latter isn't as
| important.
| adgjlsfhk1 wrote:
| This was my initial thought, but given how the format is
| designed, you would pretty much have to make a whole new set
| of tags for 16 or 32 bit images.
| thechao wrote:
| Yeah... seems like a small family would be in the right
| vein. I'm immediately drawn to a block format, but that'd
| probably double the complexity.
| MisterTea wrote:
| Honestly no one should worry about endianess in 2021.
| cassepipe wrote:
| https://justine.lol/endian.html
| MisterTea wrote:
| Yup, as soon as you see anything that checks platform
| endianess, uses memcpy, or assumes memory layout, run.
| jonsneyers wrote:
| See also:
| https://twitter.com/jonsneyers/status/1472959101416226823?t=...
|
| On my laptop, QOI encodes a test corpus at about 78 MP/s to a
| density of about 8.0 bits per pixel. Fast png encoding with fpng
| is 71 MP/s and reaches about 7.8 bpp. Fast lossless jxl encoding
| can be done at 162 MP/s (single-core, faster with more cores) and
| reaches about 6.8 bpp. It can also be decoded in parallel, unlike
| QOI and PNG. It doesn't have a single-page spec though :)
| adgjlsfhk1 wrote:
| Note that the reference QOI implementation is not speed
| focused. Some of the other ones are almost twice as fast.
| IshKebab wrote:
| This is pretty cool. Though...
|
| > all data is now encoded as big-endian.
|
| Why? If the goal is to be fast and simple then surely little
| endian would make more sense given that basically all processors
| are little endian. Big endian just means you need to swap
| everything twice.
|
| Edit: Seems like I'm not the only person that thought this was a
| super weird choice: https://github.com/phoboslab/qoi/issues/36
|
| The reasoning is:
|
| > The file format for QOI will not change anymore.
| mahkoh wrote:
| Looking at the spec, the only multi-byte data are the
| width/height in the header. So only two 4-byte swaps per file.
| politician wrote:
| The link contains a link to the discussion on QOI v2,
| https://github.com/nigeltao/qoi2-bikeshed/issues.
| Zababa wrote:
| I think refusing a 5% speedup to ensure that the format will
| not change fits really well with the goals of the project.
| ape4 wrote:
| Since its simple and there have been vulnerabilities in other
| image readers... a mathematically proven reader might be a nice
| thing to sell it.
| Zababa wrote:
| There is, I think, one thing missing: a comprehensive test suite
| that checks all the possible edge cases.
|
| Edit: as you can see below, the benchmark also checks for
| correctness so you can disregard this.
| phoboslab wrote:
| What exactly do you mean with "edge cases"? For what it's
| worth, the decoder has been fuzzed[1] and the benchmark
| suite[2] with 2800 images en-/ and decodes without losing any
| pixels, as checked by qoibench[3].
|
| [1] https://github.com/phoboslab/qoi/blob/master/qoifuzz.c
|
| [2] https://qoiformat.org/benchmark/
|
| [3]
| https://github.com/phoboslab/qoi/blob/master/qoibench.c#L440...
| Zababa wrote:
| Oh, I didn't realize that the benchmark also checked for
| correctness.
| fleabitdev wrote:
| I've dealt with image decoding for a game engine before. The
| images in question were large pixel-art texture atlases, stored
| as PNG files and loaded at startup. Their slow decoding speed
| caught me by surprise, given that the file format is 25 years
| old!
|
| The reason turned out to be that Deflate's design makes it very
| hard to parallelise or SIMD-accelerate. Even the best available
| decoders are basically forced to process a byte at a time, in a
| straight line from start to finish, which obviously isn't a good
| match for modern CPUs. The 3x to 4x decompression speedup here is
| nice, but I can't help but feel a little sad about how poorly
| this format is taking advantage of available compute resources.
| (The ultimate dream would be a format which is so parallel-
| friendly that you can just send the binary blob to the GPU and
| decompress it in a compute shader!)
|
| Even a rule like "each row is compressed separately, with a table
| of row lengths at the beginning of the file" might have been
| enough - this would have made compression ratios slightly worse,
| but complexity wouldn't have suffered too much. With only six
| different chunk types, we could perhaps imagine a branchless
| decoder where each row's decoding state is stored in its own SIMD
| lane, and the results for several rows are all emitted at the
| same time...
| asmar wrote:
| An interesting article regarding SIMD in libPNG (although not
| in deflate as complained about): https://optidash.ai/blog/the-
| case-of-the-missing-simd-code
| wongarsu wrote:
| Apple apparently ships a PNG encoder that completely flushes
| the deflate stream every now and then, giving you additional
| offsets where you can start decompressing without dependencies
| on previous data. The offsets are then saved as a separate
| chunk in the PNG. Decoders aware of this can parallelize
| decoding using these offsets, everyone else can read it as
| normal.
|
| The proper solution would of course be to use something more
| modern than deflate, the PNG format even has a metadata field
| that specifies the used compression algorithm. But anything but
| deflate wouldn't be backwards compatible, so nobody seems to
| use that option.
| ByThyGrace wrote:
| > you can just send the binary blob to the GPU and decompress
| it in a compute shader
|
| Surely this exists already?
| wolf550e wrote:
| I think texture compression is always lossy, so it's not
| directly comparable to this or to PNG. So I don't think it
| exists. See ASTC and BC7.
|
| Texture compression can be very advanced:
| https://cbloomrants.blogspot.com/2020/06/oodle-texture-
| slash...
| fleabitdev wrote:
| Reading that blog post, I was surprised to learn that many
| modern games dedicate more than half of their filesize to
| textures. I haven't played an AAA game in more than a
| decade, but I would have thought that meshes and
| (particularly) audio would use up more space.
|
| It sounds like developers are stuck between a rock and a
| hard place. They need one of the specific compressed pixel
| formats which can be efficiently read by the GPU, but those
| formats are about ten times larger than a similar JPEG
| file, they don't losslessly compress well (modulo RAD Game
| Tools' discoveries here), and recompressing raw pixels to a
| GPU-addressable format at load time would be orders-of-
| magnitude too slow.
|
| RAD Game Tools' approach here is clever, but it feels like
| a bit of a hack. The obvious next step would be a lossy
| compressed image format which can decompress directly to
| BC7, exploiting spatial-correlation tricks similar to those
| which PNG uses to get better results than a gzipped BMP
| file. Has anybody tried this already?
| modeless wrote:
| I wouldn't call Oodle Texture a hack. But there's also
| https://github.com/BinomialLLC/crunch and
| https://github.com/BinomialLLC/basis_universal. The
| latter has the advantage that it can be decoded to
| multiple different compressed texture formats, so that
| all GPUs can be supported from a single file (different
| GPUs support different formats so you can't necessarily
| send the same compressed texture to any GPU).
| fleabitdev wrote:
| Exactly what I was looking for, thanks! :)
| pcwalton wrote:
| > The 3x to 4x decompression speedup here is nice, but I can't
| help but feel a little sad about how poorly this format is
| taking advantage of available compute resources. (The ultimate
| dream would be a format which is so parallel-friendly that you
| can just send the binary blob to the GPU and decompress it in a
| compute shader!)
|
| You can't usefully decompress the _entire_ thing on the GPU,
| but you can do half of it there. You can decompress DEFLATE
| right before sending the data over the bus and do the PNG
| prediction (filtering) in a compute shader. I actually
| implemented this once (PNG prediction is amenable to wavefront
| parallelism) and it worked fine, though it wasn 't any faster
| than using SIMD on the CPU because the fact that each row can
| specify a different prediction method means that you end up
| with a lot of divergence in the shader. Still, it could get
| some load off the CPU if your loading is CPU-bound.
| robalni wrote:
| When looking at the benchmark result and image size, it seems
| like this format is good at images that have horizontal
| structures or where the same color is repeated often. It's not
| good at vertical structures, including images with enlarged
| pixels (where one row is identical to the one above). I think the
| main reason for that is that PNG has a filter pass that can look
| at the row of pixels above, while this format only looks to the
| left.
|
| Example of image with vertical gradients:
| https://qoiformat.org/benchmark/images/icon_512/apps-prefere... -
| qoi compresses to 186kb and libpng to 25kb
|
| Example of image with horizontal structure and repeated colors:
| https://qoiformat.org/benchmark/images/textures_pk/mod_walld... -
| qoi compresses to 21kb and libpng to 33kb
| adgjlsfhk1 wrote:
| One thing worth noting is that QOI can be pretty easily wrapped
| with something else to change pixel traversal order. If you
| want higher compression, I'd recommend wrapping QOI with ZSTD
| (or similar) compression on the output, and something to change
| the pixel traversal order on the input.
| Akronymus wrote:
| > and something to change the pixel traversal order on the
| input.
|
| I could see a variety of space filling curves working for
| that.
| ErikCorry wrote:
| I experimented with adding this to the embedded system Toit. I
| wrote up the experience here
| https://medium.com/@erik_68861/trying-out-the-quite-ok-image...
| and also used it for this little demo on a 2 inch screen:
| https://twitter.com/erikcorry/status/1470023885026467848
|
| It was pretty good, but in the end I don't think I'll be making
| it part of the graphics system. I would really like a bit of
| random access into the image without having to divide it up into
| multiple tiles. And the spec took a direction that didn't really
| suit me - i wanted to use it for Gui-like textures which have
| slabs of colours and anti-aliased edges with varying alpha. In
| the final version of the spec there's no way to code "the alpha
| changed, but the color is the same". without spending 5 bytes on
| a single pixel. Previously that took 2 bytes.
|
| So I'm still looking for a format with:
|
| * Very small amount of decode state (GIF's 16k is probably too
| much - PNG's 32k is certainly too much).
|
| * Fast decode (these embedded chips are not as fast as desktops).
|
| * Alpha channel (not just on-off like GIF)
|
| * A bit of random access like the parallel PNG, but hopefully
| less bolted-on.
| FullyFunctional wrote:
| All these points are spot on, but I want to reorder the
| priority. I cannot believe that a format is introduced in this
| millennium that doesn't have a single thought to aiding
| parallel access/decode. The single-thread focus will not age
| well; even low-end uProcessors are going multicore today, and
| going forward that will only accellerate.
| kevingadd wrote:
| Even ignoring parallelism, images increasingly need to go
| straight to the GPU, so improving the performance of access
| on-GPU (I.E. using a GPU compressed texture format) can have
| a bigger positive impact on the user experience than making
| the file on disk smaller. Adopting Basis to compress your
| images could in practice be much better than adopting QOI,
| and it should be benchmarked against options like that along
| with PNG or JPEG.
| modeless wrote:
| Have you considered compressed texture formats, perhaps DXT5?
| Fixed compression ratio (1 byte per pixel for DXT5), arbitrary
| random access, decompress the pixels you need on the fly. Can
| be further compressed for efficient storage to about 1-2 bits
| per pixel with https://github.com/BinomialLLC/crunch.
| choeger wrote:
| This is going to be super useful for educational purposes. I'd
| really like to see more file formats like this.
|
| There's already a bunch of minimal programming languages and even
| some operating systems. It's great value for learning stuff when
| you can implement a quite OK version of something in dozens of
| hours or less.
| ape4 wrote:
| I like how simple it is. Of course, getting a new format adopted
| is pretty difficult.
___________________________________________________________________
(page generated 2021-12-20 23:00 UTC)