[HN Gopher] QOI: Lossless Image Compression in O(n) Time
       ___________________________________________________________________
        
       QOI: Lossless Image Compression in O(n) Time
        
       Author : Ragnarork
       Score  : 790 points
       Date   : 2021-11-24 10:51 UTC (12 hours ago)
        
 (HTM) web link (phoboslab.org)
 (TXT) w3m dump (phoboslab.org)
        
       | adgjlsfhk1 wrote:
       | Why not separate the format for RGB vs RGBA based on a header?
       | Since images will never have both RGB and RGBA pixels, you would
       | be able to get an extra few percent encoding by using different
       | (and conflicting) tags for the two types of images. For example
       | you could have QOI_COLOR_RGB which would use a 5 byte tag and
       | QOI_DIFF24_RBG could store 7 bit differences for 2 colors and 6
       | bit differences for the other color. For RGBA, you could do
       | something similar for the two tags that are RGB specific.
        
         | HanClinto wrote:
         | Or even based on a #define if one didn't want to check a bit
         | flag every time you need to encode a full pixel. Would not be
         | difficult to have two versions of the library -- one for RGB,
         | and one for RGBA.
        
       | tonmoy wrote:
       | I am not familiar with image encodings, but isn't the idea of
       | complexity relative? For example getting rid of Hoffman encoding
       | and byte aligning everything may be less complex for computers,
       | but when making energy and area efficient hardware, we may not
       | care that much about some of these things?
        
         | selcuka wrote:
         | For a small number of operations, sure, but when you need to
         | compress a million, or a billion images, the performance
         | optimisations quickly add up.
        
       | khitchdee wrote:
       | This algo would work well for computer generated synthetic
       | images, not as well for scanned images.
        
         | ripdog wrote:
         | Which is why the author is positioning it as an alternative to
         | PNG, not JPEG.
        
       | jiggawatts wrote:
       | I know you don't like "design by committee" and overly complex
       | formats. Also, the compression algorithm itself is somewhat
       | independent of the "container format" that it's used with.
       | 
       | However, if I may make a suggestion: If you _do_ end up rolling
       | your own container format for any reason, always include a way to
       | store the colour space of the image!
       | 
       | Treating RGB images as arrays of bytes without further tags is
       | like treating char* as "text" without specifying the code page.
       | It _could_ be UTF8. Or Latin-1. Or something else. You don 't
       | know. Other people don't know. There's often no way to tell.
       | 
       | Similarly with colour spaces: sRGB, Display P3, and Adobe RGB are
       | _just_ close enough to be easily confused, but end up looking
       | subtly wrong if mixed up.
        
         | fabrixxm wrote:
         | It has data size in header:
         | https://github.com/phoboslab/qoi/blob/master/qoi.h#L77-L82 Just
         | put exif info after data :P
        
         | sam0x17 wrote:
         | I've never understood the need for color profiles on images
         | themselves. On monitors and displays, sure, those vary, but why
         | allow people to specify a color profile on an image, thus
         | making images no longer a ground truth for pixel value? It
         | seems like the main use case is dealing with images that
         | already have color profiles. What is the point?
        
           | kaetemi wrote:
           | Because your pixels need to transform from the image color
           | profile to the display color profile.
           | 
           | By default this all falls back to sRGB, which is not the
           | "ground truth pixel value" you may be assuming it is.
        
             | sam0x17 wrote:
             | Right, which begs the question why on earth would there be
             | more than one image color profile. The image color profile
             | should just be implied and the same everywhere.
        
               | kaetemi wrote:
               | For correct editing operations, and correct display.
               | Sadly, it's not pedantry, it's reality.
               | 
               | The 0.5 value is not half the brightness of 1.0 in sRGB.
               | It merely means half the power output to your CRT.
               | 
               | If you want correct editing operations, you need to work
               | in linear RGB.
               | 
               | It gets more fun. What is white? The color of your
               | backlight? The color of white paper under your particular
               | desk light?
               | 
               | And what is black? Is it pure darkness, or the black ink
               | of your printer? Does your printer represent pure black,
               | or dark gray?
               | 
               | We're stuck with images stored in the display profile of
               | old CRTs by default, because that was the most practical
               | option at the time.
        
               | sam0x17 wrote:
               | > We're stuck with images stored in the display profile
               | of old CRTs by default, because that was the most
               | practical option at the time.
               | 
               | The analogue in photography is photographs lose color
               | clarity and fade as they age. Why should I care if this
               | is the case in images as display technology evolves when
               | this is already a problem with every physical medium?
               | 
               | > For correct editing operations, and correct display.
               | Sadly, it's not pedantry, it's reality.
               | 
               | I've been designing images and graphics for the web and
               | sometimes for print off and on as a regular part of my
               | various job functions since ~2009 and I have yet to see a
               | situation where a color profile on an image isn't a huge
               | pain and something that needs to be stripped away to get
               | seemingly correct results.
               | 
               | Back to my point, I still don't see the value of having a
               | color profile applied to an image. Images exist in the
               | ether as platonic ideals, and we try to approximate them
               | with our various display technologies. Why complicate the
               | matter by also having multiple "flavors" of the platonic
               | ideal itself? When I say (255, 0, 0), I expect the
               | display to show me its best approximation of red. When I
               | say (255, 255, 255), I expect the display to show me
               | something as close to white as possible (and at the
               | brightest possible setting). When I say (0, 0, 0), I
               | expect the display to show me something that looks as
               | close to black as possible. It's up to the display
               | technology to decide whether this means just turn off
               | that pixel on the screen or disengage the backlight or do
               | whatever, but at the end of the day it's just trying to
               | approximate black.
               | 
               | This is complicated enough, why do I need an image that
               | will look good on only one kind of display and will have
               | it's greens look pink on other displays. Isn't having a
               | color profile for the display enough?
        
               | kaetemi wrote:
               | I get what you mean, but... the color profile of the
               | image is the profile of the display where the image can
               | be displayed without adjustment. Think of it as the
               | display profile of the guy that sent you the image. The
               | magic math transforms the image to your display profile.
               | That means it will look exactly the same on both
               | displays. If they both have a correct display profile.
               | 
               | If your green is going pink, then either your profiles
               | are wrong, or your software is broken. Maybe it really is
               | pink, and it just looks green for you, because you're
               | ignoring color profiles.
               | 
               | But the fact is, most software is broken, and you should
               | store images with the sRGB profile.
               | 
               | And also, you can actually calibrate consumer hardware,
               | so that you can scan a photo, and reprint it. And the
               | scan, display, and print will look _exactly_ the same.
               | (It 's not the case by default, because consumer printers
               | do what you do, stretch or fit the color space to the
               | printer's. The Vivid and Natural profiles in the driver,
               | respectively. This is a good default for documents, not
               | for professional photography.)
        
               | romwell wrote:
               | >That means it will look exactly the same on both
               | displays. If they both have a correct display profile
               | 
               | We can continue believing that Santa exists, or we can
               | accept that effectively nobody has correct color
               | profiles, and doesn't care either.
               | 
               | It's nice metadata you have there, would be a shame if I
               | applied _night mode_ to it at 6PM.
               | 
               | > also, you can actually calibrate consumer hardware
               | 
               | ...with professional hardware that costs more than the
               | one you have at hand.
               | 
               | Again, pretty much everyone will just tweak their
               | monitor/printer settings until they get results that look
               | OK.
               | 
               | >display and print will look exactly the same
               | 
               | Until you turn off the lights. Or until you turn _on_ the
               | lights (what color temperature is you light?). Or until
               | the wind blows, moving that cloud over the sun, changing
               | light temperature.
               | 
               | Or -- wait for it -- actually, that's all, while you've
               | been waiting the sun has set.
               | 
               | The point being, all we can shoot for is FF0000 being
               | what would be "red" for most, 00FF00 being "green", and
               | 0000FF being "blue" -- and then accept the "fade" of the
               | digital image from one screen to another as a fact of
               | life.
               | 
               | So, learn how to stop worrying and love unspecified RGB
               | colorpsaces. Magenta only exists in your brain anyway [1]
               | 
               |  _Side note_ : did you ever think about how your recorded
               | music isn't adjusted for the frequency response of the
               | loudspeakers, and half the people will listen to it with
               | the "BASS MEGABOOST" feature on anyway?
               | 
               | That's why the art of _mastering_ exists. It 's accepting
               | the reality, and putting work into making it work with
               | _uncalibrated_ , crappy hardware -- as well as hi-fi
               | gear.
               | 
               |  _PS_ : have fun calibrating your vision to make sure
               | that you don't see green where I see red
               | 
               | [1]https://medium.com/swlh/magenta-the-color-that-doesnt-
               | exist-...
        
               | cornstalks wrote:
               | > _When I say (255, 0, 0), I expect the display to show
               | me its best approximation of red._
               | 
               | Which red? There isn't a single, true "red". And
               | different materials produce different reds (my work's
               | Sony BVM-X300's reddest red is going to look way
               | different than your monitor's red). Not all displays use
               | only RGB color primaries, too. For example, at Dolby's
               | office in San Francisco they have a projector in their
               | theater that uses 5 (IIRC) color primaries, not 3.
               | 6-primary projectors exist. And other non-RGB displays.
               | 
               | > _When I say (255, 255, 255), I expect the display to
               | show me something as close to white as possible_
               | 
               | Which white? D50? D55? D60? D65? D67? Something else? And
               | yes, these different white points (and many others) are
               | actually used in practice.
               | 
               | > _(and at the brightest possible setting)._
               | 
               | 100 nits looks way, way different than 4,000 nits. Some
               | monitors can do 10,000 nits.
               | 
               | > _When I say (0, 0, 0), I expect the display to show me
               | something that looks as close to black as possible. It 's
               | up to the display technology to decide whether this means
               | just turn off that pixel on the screen or disengage the
               | backlight or do whatever, but at the end of the day it's
               | just trying to approximate black._
               | 
               | Which black? This might sound dumb, because we can agree
               | that there is an objective "absolute black" (i.e. zero
               | photons). But when an artist creates an image, the
               | monitor they use has some version of black. If you don't
               | account for that, the image may be distorted. Blacks can
               | be crushed, for example.
               | 
               | An absolute color space exists. It's called XYZ. We could
               | use it. Some image formats support it.
        
               | formerly_proven wrote:
               | > An absolute color space exists. It's called XYZ. We
               | could use it. Some image formats support it.
               | 
               | XYZ is really annoying to work in though. ACES is a
               | pragmatic solution here, quite literally:
               | https://blog.frame.io/wp-content/uploads/2019/09/ACES-
               | APO.jp...
               | 
               | Okay, fair enough two out of three primaries are
               | imaginary colors, but nobody said you have to use the
               | whole color space when your processing pipeline is meant
               | to be using 32 bit floats. Delivery formats might want to
               | be using a more real color space.
        
               | pixel_fcker wrote:
               | ACES (AP0) is an archive format for when you want to be
               | absolutely sure you're not clipping any colours. As a
               | working space it's terrible and AP1 should be preferred,
               | either as ACEScg or ACEScc
        
               | Dylan16807 wrote:
               | AP1 is almost the same as rec.2020? I'm not sure what you
               | mean by 'working space' but if preserving all colors was
               | a primary or even secondary goal I definitely wouldn't
               | cut myself down that far.
        
               | spc476 wrote:
               | And the red in my left eye isn't the same as the red in
               | my right eye. Yes, when the lighting conditions are _just
               | right_ , I see different hues out of each eye [1]. I have
               | to wonder how one would (or should?) correct for that in
               | an image file format.
               | 
               | [1] I think from a stupid thing I did as a kid.
        
               | dekhn wrote:
               | took me a while to notice that I perceive color and light
               | intensity differently in my two eyes. I think this is
               | actually pretty natural (IE, it happens commonly?).
               | Either way, I can also see polarization (haidinger's
               | brush) which confused me a bunch when I was trying to
               | explain what I saw and everybody else thought I was
               | crazy).
        
               | cornstalks wrote:
               | I think the best we can do with file-level metadata is
               | account for only objective metrics. Subjective
               | differences are best handled outside of the file
               | metadata. The user can tune their display settings to
               | match their preferences. This allows for correcting for
               | physiological differences and stylistic preferences. If
               | all files had correct metadata (for objective metrics),
               | it would be a lot easier for an end user to tune the
               | system to their liking because their end results would be
               | consistent with their preferences.
        
               | kaetemi wrote:
               | The sRGB color #ffffff should never mean "the brightest
               | white on this monitor", unless you're using an average
               | CRT.
               | 
               | Just imagine you're using an HDR display, where the
               | brightest white is as bright as the sun.
        
               | makapuf wrote:
               | OK but then when will you use that "bright as sun" color?
               | If not, why provide them? If so, what color will you use
               | ?
        
               | formerly_proven wrote:
               | sRGB is quite literally defined as "the average 90s CRT
               | in an office environment", i.e. full-white sRGB on a HDR
               | display should be around 100 nits or so in those
               | reference conditions (i.e. display set to "reasonable"
               | brightness).
        
               | adgjlsfhk1 wrote:
               | The color you will use isn't in SRGB, but is in other
               | color spaces such as Rec2020.
        
               | kaetemi wrote:
               | VR displays. Probably the color of the sun in linear RGB
               | color space for rendering purpose, then converted to the
               | display color space.
        
               | formerly_proven wrote:
               | You're assigning a color profile to the image in your
               | model - the color profile of the display. Hence no color
               | conversion needed, as source and target match.
               | 
               | What "red" and "green" are has changed quite dramatically
               | with different display technologies. A display designed
               | to meet Rec.2020 can show colors that other displays
               | literally cannot produce and the deviation between the
               | primaries is so big that everything looks like garbage if
               | you don't do a color space conversion. Take some sRGB
               | content and display it on a DCI P3 display. Looks like
               | shit. Humans look like crabs.
               | 
               | > On monitors and displays, sure, those vary, but why
               | allow people to specify a color profile on an image
               | 
               | The sole reason why we have device profiles defining
               | device color spaces is so that we can convert from the
               | image's color space to the device's color space. If
               | images don't have a profile assigned to them, you don't
               | need a device profile.
               | 
               | So you either have both image and device profile, or
               | neither. Just one doesn't make sense.
        
               | Sharlin wrote:
               | What does "255" mean? If you just take it to mean "max
               | saturation that your display can output" then it's going
               | to be perceived as a different color depending on the
               | viewer's display. That is undesirable. And due to
               | historical reasons (the capabilities of CRT monitors) the
               | meaning of "max saturation" was de facto standardized to
               | what we now call sRGB. If you want to encode a greater
               | range of colors than sRGB then you need to be able to say
               | that hey, this "255" is more saturated than what sRGB
               | considers "255" and should not be displayed on sRGB
               | without conversion.
        
               | AlotOfReading wrote:
               | Because display technology can't reproduce the full set
               | of colors we see and and moreover, we have a finite
               | number of bits to encode the value of any pixel. A color
               | space/profile is both a target for display manufacturers
               | to produce and a way of standardizing the trade off
               | between bit depth and quantization error.
        
               | floatingatoll wrote:
               | Unfortunately, that's not the case in pragmatic reality.
               | 
               | When I say #ff0000, do I mean "sRGB red", as a TV would
               | display in pure red phosphors back in the 1970s, or do I
               | mean "1000K" / 700nm red (daylight is 6500K), which can
               | only be reproduced by specialized displays?
               | 
               | Most people from the time before Apple made Display P3
               | wide color displays standard in all their products -- so,
               | most HN commenters -- believe that #ff0000 just means
               | "only red, no green or blue" -- but all web browsers
               | currently are left to invent their own answer for what
               | #ff0000 means, and they do not all invent the same
               | answer. Yet.
               | 
               | So it is with images.
               | 
               | In the beginning times, people made sixteen-color images
               | for NTSC monitors, and then other people learned you
               | could display synthetic colors that don't exist by
               | mangling the image bitstream to the monitor, and those
               | sixteen colors were hard-coded but varied by like +/-5%
               | per device due to acceptable manufacturing tolerances, so
               | they were internally consistent anyways.
               | 
               | And so we end up, today, trying to solve color profiles
               | for file formats that specify colors as RGB hex values --
               | which are, by definition, restricted to sRGB and thus
               | wildly insufficient. But if we plan too far, we get file
               | formats that are totally ridiculous - TIFF comes to mind
               | - that can represent anything under the sun, at the cost
               | of having a 1:1 mapping of "pixels" to "bytes" or worse.
               | 
               | You may also find the history of the EXR format relevant
               | here, as it supports a functionally infinite amount of
               | dynamic range in images, and there are definitely good
               | niche use cases for it -- but pragmatically, it's
               | ridiculously overkill for normal everyday purposes. You
               | could argue that everyone should use EXR in L _a_ b, but
               | then someone would suggest Oklab (since it makes better
               | perceptual compromises) or HSLuv (what a great name), or
               | you could try storing raw CIE 1931 chromaticity
               | coordinates, which has been deprecated by CIE 170-2 to
               | address serious flaws in the 1931 system.
               | 
               | Tying this back to engineering, there are often very good
               | reasons for designing a processor or a byte store to be
               | RISC or CISC, big or little endian, and we simply cannot
               | predict what will work _most pragmatically_ for all cases
               | universally ahead of time, any more than the metric
               | system was invented early enough in time to prevent miles
               | per hour from being a unit of measure.
        
           | kortex wrote:
           | No single color space is complete. Each has trade-offs. I
           | would actually say that color profiles help get _closer_ to
           | ground truth, that is, rendering intent.
        
             | demute wrote:
             | CIE XYZ is complete.
        
               | lstamour wrote:
               | Isn't that kind of like saying Real numbers are complete?
               | Yes, it's true, but we don't have an infinite level of
               | precision for each colour. So we have to, practically
               | speaking, limit how many colours can be specified to some
               | reasonable (and arbitrary) number, e.g. 10 million or 1
               | billion or more (in terms of number of colours in a
               | colour space) and representation in digital bits. Also,
               | since I conflated the two a bit, it is often helpful to
               | restrict what the max range of a colour is so that you
               | can get more useful data out of the same precision. It's
               | kind of like creating a dictionary to compress data, you
               | can express more useful values with less data by choosing
               | to display a subset of colours and by picking where the
               | white point is for your display. These optimizations used
               | to mean more than they do today, perhaps, but it can
               | still take a lot of effort to transfer 10-bit or 12-bit
               | raw pixel data at 4K resolution, 120 times a second...
               | And it's worth pointing out that while 8-bit is all you
               | might need as precision for perceptually lossless HDR
               | (with dithering), that kind of range isn't good enough
               | for colour editing work. It's like starting a photo edit
               | from the JPEG rather than the RAW file.
        
               | adgjlsfhk1 wrote:
               | 8 bit just objectively isn't enough for hdr. 8 bits is
               | few enough that you can pretty easily see banding on a
               | simple color gradient in sdr. Dithering helps a lot, but
               | once you get to HDR 10 bits is pretty much the minimum.
        
               | Dylan16807 wrote:
               | > Isn't that kind of like saying Real numbers are
               | complete?
               | 
               | It's more like saying the range from 0..2 is complete so
               | don't use an encoding that caps out at 0.8 or 1.1
               | 
               | The difference between all visible colors and a more
               | constrained color space is only a factor of two or three.
               | Less than half a bit per channel.
        
               | kortex wrote:
               | CIE XYZ has imaginary colors, though.
               | 
               | I guess when I said "no color space is complete" I meant
               | "no color space is faithfully bijective to nominative
               | human visual sensory response." Because such a thing is
               | basically impossible (or insanely nonlinear and
               | unwieldy).
               | 
               | Also these color systems all or almost all assume a
               | "norminal" (to steal a word from Everyday Astronaut,
               | normal+nominal) average human vision system. If your
               | system isn't norminal, average, human, or even visual
               | (visible light), then your color space is an incomplete
               | map of the territory.
               | 
               | So for any given array of pixel channel, there is either
               | an implicit assumption or explicit color space of what
               | those values "mean".
        
               | Dylan16807 wrote:
               | Why does it matter if it's bijective?
        
               | kortex wrote:
               | So GP's question was basically "why do we need color
               | profile?" and the short answer is "to know what images
               | _should_ look like ".
               | 
               | In order to know what an image _should_ look like, a
               | "perfect" or "complete" representation, at minimum the
               | real colors must be injective to the color space (every
               | value has at most 1 color). You could argue that the
               | color space needn't be injective (values that have no
               | color), which is probably fine if your image is static
               | and you never manipulate it. As soon as you manipulate
               | values of your image in your color space, you run the
               | risk of falling outside the color domain.
               | 
               | But really to unpack this whole thread, my essential
               | point is that color spaces are about engineering
               | tradeoffs. sRGB can't reproduce many colors. CIE 1931
               | represents all color but has imaginary colors.
               | 
               | Ergo, I contend there is no "complete" color space.
        
               | contravariant wrote:
               | If you agree with their model of human vision anyway.
        
             | makapuf wrote:
             | Using one given color profile is part of the trade-off.
        
           | dahart wrote:
           | The point is to use bits effectively and not waste them, by
           | accounting for what _usually_ happens on most displays and in
           | our eyes  & brains. If we don't account for it, then (for
           | example) a 1-bit difference between two bright colors means
           | something very different from a 1-bit difference between two
           | dark colors.
           | 
           | So what is "ground truth" exactly? We need to know what units
           | we're talking about; is it lumens, or "tristimulus" values
           | (what we want our eyes to see), or LCD monitor voltages, or
           | something else? First we have to agree on what ground truth
           | means.
           | 
           | A big reason for common formats like PNG and JPEG to need
           | color profiles is because 8 bits per channel is not enough
           | color resolution unless you store the colors non-linearly. 8
           | bits is barely enough (or some would say _almost_ enough) to
           | represent low-dynamic-range images on TVs and monitors even
           | when you have the colors gamma encoded or use a perceptual
           | color space.
           | 
           | The problem with storing "ground truth" values for pixels in
           | 8-bits-per-channel formats is that many experts would
           | probably say this implies a linear color space. And a linear
           | color space encoded in 8 bits leaves noticeable gaps between
           | colors, such that you can see color banding if you try to
           | render a smooth gradient. You lose up to 2 bits of color
           | precision in some regions, notably the dark colors.
           | 
           | So, in order to not have color banding, we correct the colors
           | before storing them by trying to make the colors have even
           | steps between any 1-bit change in the spectrum. The color
           | profile is just a record of what was done to correct the
           | colors, so someone else can un-correct them as needed.
           | 
           | You might think we could do away with color profiles for
           | high-dynamic-range images, but the non-linearity of our
           | vision and our displays means that linear encodings are still
           | inefficient with their bits even if you have 32 bits per
           | channel.
        
           | mlhales wrote:
           | A color space and color profile are not the same. The point
           | is knowing what a byte represents in a byte array. Is it red,
           | hu, saturation, alpha, brightness etc?
        
             | romwell wrote:
             | The format in question specifies this though.
        
         | amelius wrote:
         | Yes, also store the DPI.
         | 
         | In general, allow arbitrary key-value pairs to be stored.
        
           | makapuf wrote:
           | If you want any interop you'd better standardize some keys,
           | allowing for extensions. Store those in key,len,data records
           | and you have iff/ riff formats ... used by .avi and .wav
           | formats.
        
           | mkl wrote:
           | > Yes, also store the DPI.
           | 
           | I don't think that's a good idea. Most images, like photos
           | and screenshots, don't have physical dimensions. Some image
           | tools like to put in bogus default DPI values that are
           | guaranteed to be wrong, and it can be a bit of pain.
        
             | amelius wrote:
             | How else would you reproduce images at exactly the right
             | size? Without DPI, the format would be useless for many
             | scientific purposes.
             | 
             | But I get your worries. As a compromise, DPI could be
             | optional.
        
               | mkl wrote:
               | Yes, optional is fine.
        
         | kortex wrote:
         | Since we are talking container formats, I would recommend
         | something in the IFF/RIFF family (which includes "true" Four-
         | byte-type IFFs like RIFF, WAV, WEBP, and variants like PNG,
         | MOV, MP4). They are about as simple as you can get as a
         | container format. It has its limitations but it's basically the
         | simplest "boxed" format you can implement - Type-Length-Value.
         | 
         | I would absolutely recommend basing on some sort of container
         | format (vs "raw" data structures), it makes it really easy to
         | add functionality without breaking backwards compatibility.
         | 
         | Basically all container formats follow this idea of "collection
         | of boxes" pattern. The IFF series are box-length based. Ogg
         | uses capture patterns but no explicit box size (I think). The
         | former means you have to know the box size ahead of time (or
         | seek back and write it in), the latter is better for streaming,
         | but harder to seek (without indexes).
        
         | Asooka wrote:
         | I work in a VFX adjacent space, so I have some pretty strong
         | feelings about what data should be stored in the colorspace
         | tags. The colorspace data should be an arbitrary string. I
         | would also like to have "usage intent" tag for whether this is
         | color or non-color data. You would think that the colorspace
         | would tell you, but people do silly things like save specular
         | maps as sRGB. For completeness I would want the following
         | fields:
         | 
         | - colorspace - arbitrary string, the colorspace of the color
         | data on disk.
         | 
         | - rgb_primaries - arbitrary string, indicates the physical
         | meaning of the linear RGB values.
         | 
         | - transfer_function - arbitrary string, indicates the function
         | to use to linearise the values from disk into the RGB
         | primaries.
         | 
         | - intent - one of display, texture, data, or an arbitrary
         | string. Display images get tone-mapped, textures do not, data
         | only gets linearised, even if it has rgb_primaries attached,
         | other values only have internal meaning to the concrete art
         | pipeline.
         | 
         | For the first three, rgb_primaries and transfer_function take
         | precedence over colorspace. The reason I want them is because
         | people do mix and match. You could say that you can put the
         | same data in some string format in the colorspace tag, but then
         | everyone will invent their own slightly different way of
         | specifying it and interoperability will suffer. Best to just
         | force the values to be unstructured.
         | 
         | A lot of arbitrary strings, but that's how it is - there is
         | really no standard list of colorspace names. These four fields
         | can cover pretty much everything I have seen people do.
         | 
         | Though currently a lot of people just tag the filename with a
         | colorspace suffix and call it a day. OpenColorIO even has a
         | function to directly get a colorspace from a filename.
        
         | em3rgent0rdr wrote:
         | That metadata could be stored in the filename to keep it out of
         | the actual file.
        
           | dylan604 wrote:
           | No, the filename is the worst place to put metadata. Nobody
           | can control what will happen to a file name.
        
             | em3rgent0rdr wrote:
             | well nobody can control what will happen to a file's
             | content either if you want to be pedantic.
             | 
             | If are worried about dumb users renaming it...I think most
             | users know to not change things after a '.', so could just
             | put the colorspace code there. And then could put another
             | '.QOI' to end the filename.
             | 
             | If want to keep cruft out of the file itself, then putting
             | it in the filename works.
        
         | cyber_kinetist wrote:
         | I think the author's intending to use this only as a data
         | packing format (ex. for app/game development) and not a
         | general-purpose exchange format. If all you want to do is pack
         | image/texture data for your game then I totally get why they
         | assume RGBA8 storage and leave out metadata such as color space
         | information.
        
           | Const-me wrote:
           | Videogames don't need color spaces, but they have different
           | compression formats on GPUs, like S3TC, BC7, or ASTC. These
           | things save not just storage, also VRAM bandwidth.
        
           | zcw100 wrote:
           | I wonder if color profiles will be important to images used
           | for AI. Right now, as far as I can tell, people generally
           | ignore details like that but I wonder if someday people are
           | going to say, "oh wait, we've been completely ignoring color
           | profiles. We could have been using that" or "we didn't
           | realize how ignoring that messed things up"
        
             | kortex wrote:
             | I'm in the computer vision space and have looked into the
             | effect of color spaces and encodings.
             | 
             | People had basically exactly the epiphany you described.
             | For example, there are several color models which try to be
             | more lighting-invariant or (racially) color blind (this is
             | a very similar problem, humans are all more or less
             | "orange" hue-wise due to melanin's spectrum). TSL is one
             | such space, there are others.
             | 
             | https://en.m.wikipedia.org/wiki/TSL_color_space
        
           | pornel wrote:
           | I know games tend to assume color spaces from context, and do
           | other hacky things like reusing alpha for height maps or RGB
           | for vector coordinates.
           | 
           | But this approach is a source of endless conflict with image
           | processing tooling, because nobody except the game author
           | knows what the channels mean. Game developers keep
           | complaining about image processing tools doing anything other
           | than bytes-in-bytes-out on that data (e.g. clearing "unused"
           | alpha or treating RGB as human-perceptible colors), because
           | it breaks the unstated assumptions.
        
             | dv_dt wrote:
             | Yup embed enough metadata so you can actually nicely
             | automate/integrate doing the right thing with tooling.
        
           | jameshart wrote:
           | Right - this thread of people asking for colorspaces, dpi
           | data, arbitrary metadata collections... all are exhibiting
           | precisely the 'expert group' committee mindset that this
           | format is supposed to avoid.
           | 
           | If you want all that, there are formats galore that support
           | it! The point is more: where are the formats for people who
           | don't need all that? Here is one. Please don't try to ruin
           | it.
        
             | alkonaut wrote:
             | You don't need to support a ton of complex stuff, but if an
             | rgb image format _doesn't_ then it should just have a
             | version marker and the spec can simply say "the rgb data
             | represents sRGB colors".
             | 
             | Just like e.g. some compilers don't support a lot of source
             | file encodings but their non-support consists of clearly
             | stating that a source file is UTF-8.
             | 
             | Note that this requirement is only for an image exchange
             | format. A raw-er image format used internally in e.g a game
             | doesn't need it as badly.
             | 
             | But in an exchange format _every_ user needs to know
             | whether the image data is sRGB or Adobe. Because the reader
             | didn't encode the data.
        
       | giorgioz wrote:
       | >I can almost picture the meeting of the Moving Picture Experts
       | Group where some random suit demanded there to be a way to
       | indicate a video stream is copyrighted. And thus, the copyright
       | bit flag made its way into the standard and successfully STOPPED
       | MOVIE PIRACY BEFORE IT EVEN BEGAN.
       | 
       | This bit made me laugh out loud :D
        
         | ur-whale wrote:
         | > This bit made me laugh out loud :D
         | 
         | Yeah, it's funny, until you actually go read the MPEG spec.
         | documentation.
         | 
         | Then any and all inklings of laughter will very quickly
         | evaporate.
        
           | h4kor wrote:
           | What does it say about these bits?
           | 
           | I'm currently tinkering around with mp3s and wondered how
           | many "useless" bits are in each frame.
           | 
           | I wonder if they are ever used to trace distributors of
           | pirated music, similar to yellow dots in printers.
        
         | BlueTemplar wrote:
         | Indeed, however remember the mandatory "What colour are your
         | bits ?" :
         | 
         | https://ansuz.sooke.bc.ca/entry/23
        
         | barrkel wrote:
         | I rather wonder if things like this are used to encode a proof
         | of intent.
         | 
         | If something is marked as copyrighted, it may be trivial to
         | remove the mark, but at least you had to remove the mark, and
         | that's evidence of intent rather than ignorance.
        
           | akx wrote:
           | Or a piece of software in your toolchain has been unaware of
           | the bit and inadvertently stripped it.
           | 
           | Is that evidence of intent still?
        
             | barrkel wrote:
             | I'm not arguing in favour of it, I'm hypothesizing why it
             | might be added.
             | 
             | Your toolchain should set the bit last thing before
             | distribution, I'd expect.
             | 
             | If you're talking about a toolchain on the consumption side
             | after distribution, well, consumers shouldn't be modifying
             | content, is what I'd expect the position to be. No
             | derivative works and all that.
        
       | zosoworld wrote:
       | I'm wondering how scalable would this be for hyperspectral
       | images, with 25 channels for example. I'm guessing it's just
       | repeating the processing for RGB channels but for 25?
        
         | jameshart wrote:
         | Very much the kind of expert group design-by-committee type of
         | requirement that this simple compression scheme was designed
         | not to handle.
         | 
         | Since this scheme works primarily on exact cross-channel pixel
         | matches, no. No it won't work well for 25 channel images.
         | 
         | Which is a good thing. Most people are not trying to encode 25
         | channel hyperspectral data and if they are they can use their
         | own format designed well for that purpose (or a committee-
         | derived format that handles it adequately).
         | 
         | Or they can split the 25 channels across 7 4 channel images and
         | encode them using this.
        
           | zosoworld wrote:
           | I did not try to imply that this should contemplate the other
           | use case, but I was wondering whether the compression
           | percentage would scale up with more channels that might not
           | be related in the same way RGB are (as in luminance, for
           | example).
           | 
           | This effort really makes me want to try to do something
           | similar for the other use case, and see how well it fares
           | compared to the convoluted standards that are used for this
           | topic.
           | 
           | Thank you for your insight!
        
       | blondin wrote:
       | this is so satisfying to see. kudos to developer!
        
       | rocqua wrote:
       | I immediately thought of a use-case for (a modified version of)
       | this: raw compression in cameras.
       | 
       | Currently Sony has two options. Uncompressed raw that is huge, or
       | lossy compressed raw that is half the size. They probably chose
       | those options because throughput when shooting quickly matters a
       | lot, so something like PNG would slow down shooting too much.
       | 
       | I imagine a blazingly fast halfway decent compression algorithm
       | would be great. It might even speedup shooting by needing to
       | write fewer bytes. But in general people taking star pictures
       | will love having compression without it being lossy.
       | 
       | For reference an uncompressed raw file for 44MP is about 90MB.
       | That starts filling up drive space rather fast.
        
         | w-m wrote:
         | I don't think this method fits camera raws particularly well.
         | The real world is noisy, so it becomes very unlikely that you
         | would ever hit the two most compressible methods of QOI, (1)
         | pixel value repeat or (2) index into one of the previous 64
         | pixels. QOI works really well for 8-bit screenshots, with lots
         | of pixel-values repetitions. No such luck in noisy 14-bit real-
         | world images.
         | 
         | Also, this format was designed to be as simple as possible, and
         | is byte-aligned, only handles 8-bit RGBA. You would need a
         | different design for 12/14-bit RGB or RGBG (Bayer).
        
       | carrolldunham wrote:
       | Could a hash collision make it inexact? so not guaranteed
       | lossless?
        
         | [deleted]
        
         | phoboslab wrote:
         | No, the encoder checks if the color at the hashed value really
         | matches before storing the QOI_INDEX. There's probably better
         | ways to hash it, but this just seemed "good enough".
        
       | formerly_proven wrote:
       | This is pretty neat. A lot like the filtering done in PNG, just
       | without slapping zlib on the result. As far as SIMD goes, I agree
       | with the author and don't see much potential, as instructions are
       | variable-length and most instructions encode deltas. It's
       | probably quite close to optimal by just letting the CPU figure it
       | out on the micro-level.
       | 
       | If each line were encoded independently they could be en- and
       | decoded in parallel (once you add a skip instruction to tell
       | where the next line starts, this does preclude streaming of
       | intra-line output though). The hit to compression ratio should be
       | small as long as images are wide enough for the color buffer
       | reset to not matter.
        
       | pedrocr wrote:
       | These kinds of formats have been somewhat explored by camera
       | makers for their raw formats. Since the computing resources in
       | the cameras can be quite limited there are a few formats that try
       | to occupy this space of simplicity with still some tricks for
       | compression. Samsung's various SRW encodings remind me most of
       | this:
       | 
       | https://github.com/pedrocr/rawloader/blob/a59bb78d156277781a...
        
         | IshKebab wrote:
         | Yeah I don't think there are any new ideas here. RLE,
         | differential encoding and dictionary encoding have all existed
         | for decades.
         | 
         | The interesting thing is that it comes close to PNG.
        
         | formerly_proven wrote:
         | Lossy RAW is sometimes even simpler than that. E.g. Nikon's old
         | compression was just remapping values in a log-like fashion,
         | which reduced resolution in the brightest few stops[1] to save
         | a few bits per sample.
         | 
         | [1] Perception is pretty logarithmic as usual, so for a 14-bit
         | readout, half the values represent the brightest stop and a
         | little bit, the other half holds the other 12+ stops.
        
           | pedrocr wrote:
           | The simplest form is just to pack 10 or 12 bit values.
           | There's all kinds of variations of that:
           | 
           | https://github.com/pedrocr/rawloader/blob/a59bb78d156277781a.
           | ..
           | 
           | There are a few formats that use a curve and less bits. They
           | do become lossy and doing dithering on decompress is useful
           | to avoid banding.
           | 
           | The Nikon one you mention was only used very early and is
           | decoded by decode_8bit_wtable() in that file. It's just
           | looking up the 8 bit value in the table and then adding some
           | randomness to prevent the banding.
        
       | nightcracker wrote:
       | QOI_DIFF24 {             u8 tag  :  4;   // b1110             u8
       | dr   :  5;   // 5-bit   red channel difference: -15..16
       | u8 dg   :  5;   // 5-bit green channel difference: -15..16
       | u8 db   :  5;   // 5-bit  blue channel difference: -15..16
       | u8 da   :  5;   // 5-bit alpha channel difference: -15..16
       | }
       | 
       | It bothers me more than it should that the 5-bit signed
       | differences aren't -16..15 matching two's complement but -15..16
       | instead.
        
         | hoseja wrote:
         | Yes, this is baffling. Wonder if the format could still be
         | amended to fix this?
        
         | selcuka wrote:
         | Who knows? Maybe it's the result of a heuristic that achieves a
         | slightly better compression ratio.
        
         | phoboslab wrote:
         | I should probably fix that. Is there a clever way to unpack a
         | 5bit signed int other than treating it as unsigned and
         | subtracting 16?
        
           | benlivengood wrote:
           | You'll have to shift the values around anyway, so:
           | 
           | int intermediate = byte_or_word << N; // N chosen to put the
           | five bits in the highest positions in the int.
           | 
           | int value = intermediate >> bits_per_int - 5;
           | 
           | Instead of masking rely on arithmetic shift right to keep the
           | sign bit.
        
           | adgjlsfhk1 wrote:
           | That seems easiest.
        
             | wizzwizz4 wrote:
             | Especially since your compiler will make it a super magical
             | optimised version.
        
       | makapuf wrote:
       | Would be interesting to try to use it as is but in a lossy way
        
         | andrewla wrote:
         | Very much this -- it will unfortunately have a horizontal bias
         | if done naively, but I've been exploring the space of doing
         | lossy preprocessing to improve image compression, but the
         | layered heuristics in png make this difficult. This format is
         | simple enough that we have a number of approximation heuristics
         | -- find the closest color in the last 64, use a smaller delta,
         | or preserve one or two channel colors in the next pixel.
        
       | junon wrote:
       | To answer the author's question: No, SIMD would not help with
       | this.
       | 
       | This is neat. I wonder if the author would be willing to write a
       | Kaitai Struct definition for it.
       | 
       | Something else interesting: QOI is 1,2,2 letters off from PNG.
       | I'm quite certain this is an accident but it's interesting
       | nonetheless.
        
         | jiggawatts wrote:
         | I suspect SIMD would help with the encoding. The lookup table
         | is small enough to fit into 8 AVX2 registers, so instead of
         | hashing, you could use direct lookup, which would improve
         | compression ratio further (a little bit).
        
           | junon wrote:
           | Yes encoding might benefit of course, I was more considering
           | decoding speed I suppose.
        
             | jiggawatts wrote:
             | There are some clever tricks that can be pulled with the
             | latest instructions sets like AVX-512. The registers are
             | huge and the instructions available are so varied that
             | there are clever ways to use them in "off label" ways to
             | implement lookup tables and bit-level parsers.
        
           | Const-me wrote:
           | > The lookup table is small enough to fit into 8 AVX2
           | registers
           | 
           | Indeed.
           | 
           | > so instead of hashing, you could use direct lookup
           | 
           | However, I don't think that part gonna work. See the code
           | using that table:
           | https://github.com/phoboslab/qoi/blob/master/qoi.h#L324-L328
           | 
           | SIMD registers aren't indexable (at least not on AMD64), the
           | register needs to be known to the compiler.
           | 
           | Lanes within each register aren't indexable either. The
           | insert and extract instructions are encoding lane index in
           | the code. There're workarounds for this one, like abusing
           | vpshufb or vpermd, but with that amount of overhead I doubt
           | SIMD will deliver any profit at all.
        
       | dahart wrote:
       | This is pretty neat! The compression rates and encode/decode
       | rates are impressive for such a simple rule set.
       | 
       | > SIMD acceleration for QOI would also be cool but (from my very
       | limited knowledge about some SIMD instructions on ARM), the
       | format doesn't seem to be well suited for it. Maybe someone with
       | a bit more experience can shed some light?
       | 
       | I don't know about SIMD encoding, but for decoding at least, the
       | core problem is to make some unit of the image (pixels,
       | scanlines, blocks, etc) independently addressable (readable). The
       | current format is complicated due to pixels of different sizes;
       | there's no way to know where data for pixel 1000 starts without
       | reading all preceding 999 pixels. You could build a prefix-sum
       | array of pixel indices, but that will fully undermine your
       | compression.
       | 
       | One question might be how well does QOI work on independent scan
       | lines or blocks? Is the compression still pretty good if the
       | look-back window is limited in size and restarts every once in a
       | while?
       | 
       | To make a GPU capable format, the very first thing you would
       | probably need to do is separate the tag bits from all the other
       | data, so a thread could identify the pixel type by direct lookup
       | into the tag buffer. It then needs to know where to go to find
       | the remaining data, so you'd probably need at least one byte of
       | data that could be used to reliably locate the rest of the data.
       | I don't see an easy way to make this happen for separate pixels,
       | but it might not be bad if the image is encoded in blocks or
       | scanlines.
        
         | mmastrac wrote:
         | Wouldn't it make sense to transcode to a GPU-specific lossless,
         | compressed format for rendering?
        
           | dahart wrote:
           | Yes, totally. Depends on your use case, but there certainly
           | could be good reasons to keep images in compressed form even
           | GPU RAM. It'd be ideal if decode is both random-access and
           | predictable amounts of work per pixel. This format checks the
           | predictable work box, but would need a little design to make
           | it random access.
        
       | nathias wrote:
       | Very cool.
        
       | javajosh wrote:
       | Very cool - I hope browsers support this soon!
       | 
       | Funny meta comment: when I read " _offering a 20x-50x speedup in
       | compression and 3x-4x speedup in decompression_ " I caught myself
       | reacting with disappointment to the "only" 3x-4x speed up. This
       | is funny because 3x improvement is enormous; it's just small
       | compared to 30x. There's a lesson somewhere in there about
       | sharing stats together to achieve an emotional effect.
        
         | xiphias2 wrote:
         | It's trivial to port this to JS / WebAssembly without losing
         | speed (using bytearrays) ..it's a nice afternoon project
         | actually
        
       | Animats wrote:
       | Lossless avoids all those issues around human perception. Also,
       | if you're storing game assets, some of what you're storing as
       | images is not really imagery. It may be normals or height fields,
       | which lossy image compression messes up.
       | 
       | The code is interesting, in that it iterates over the output
       | array, not the input. This helps prevent write buffer overflows.
       | It has read buffer overflows, though. You can craft a compressed
       | file which, when decompressed, will contain info from elsewhere
       | in memory.
        
       | ur-whale wrote:
       | Wondering how well this does on noisy / grainy images since it
       | relies so much on pixel similarity.
        
         | EmilioMartinez wrote:
         | Notably, pixel similarity along a particular order. If I
         | understood correctly something like a vertical rainbow would
         | compress significantly different than the same picture rotated.
        
         | makapuf wrote:
         | Noisy but around recurring or slighly different pixels would
         | achieve a 1:4 compression rate
        
         | WithinReason wrote:
         | All lossless compression algos do badly on noisy images
        
         | [deleted]
        
       | WaitWaitWha wrote:
       | I am going to be the fly in this ointment.
       | 
       | I appreciate the premise to develop this, and the science that
       | comes with it.
       | 
       | I cannot tell if the author is writing in jest or serious when he
       | is rude to the historical work that was done before him, yet
       | relies on. It comes across as hubris with no floor to hold him
       | up.
       | 
       | Today's genius is often questioned and found to be a fool of
       | tomorrow, and a bit of grace goes a long way.
        
         | kortex wrote:
         | Have you _seen_ codecs? I 've recently been exploring this
         | space and there is not one corner of this universe that isn't a
         | lurking Lovecraftian horror.
         | 
         | Un-compressed formats like BMP are sane I guess but most of
         | those are wrapped in crufty, sparsely documented container
         | formats. Ogg is decent but even it takes quite the deep dive to
         | even begin wrangling it.
        
       | kasper93 wrote:
       | Neat work.
       | 
       | Though I wouldn't be myself if I didn't have few nitpicks ;)
       | 
       | - You are mixing lossy and lossless compression in your rant. And
       | you are mixing containers with formats. Thing about lossy
       | compression is that it is by design stupidly slow and complex to
       | produce smallest possible results which has best perceived
       | quality. Compress once and trade time for both image and
       | compression quality. Lossless is not that bad, although your
       | simple solution is well, simple :)
       | 
       | - I feel like the benchmark suite is lacking. For better overview
       | you probably should include libpng results with max compression
       | level and lowest compression level. Lossless modes of AVIF and
       | WEBP would be nice. (also could throw similar project to yours
       | like https://github.com/catid/Zpng) Not saying the benchmark is
       | bad, but IMHO doesn't paint the full picture. From quick test I
       | got significantly better compression on libpng, ofc in expense of
       | time, but you didn't configure libpng for speed either. So we
       | have some results, but they are not really representative imho.
       | 
       | - I understand it is pet project, but don't forget about
       | importance of image metadata, like colorspace and so on.
       | 
       | - Also keep in mind that compression is easy at the beginning,
       | but the tricky part is how to squeeze more.
       | 
       | EDIT: formatting...
        
       | jiri wrote:
       | This code is nice for RGB, but what about YUV420 format? There
       | should be even more efficient way to store image in YUV420 (takes
       | half bytes than RGB 888), and most images (JPEGs, direct cameras
       | output in Android etc) are already in YUV420. I wonder if auther
       | has tried YUV420 already and failed for any reason ...
        
       | haxiomic wrote:
       | This is impressive, I was curious if it was just better on
       | simpler images with many duplicate pixels but it's nice to see it
       | working well with photography too
       | 
       | e.g.
       | https://phoboslab.org/files/qoibench/images/wallpaper/EwZDbL...
       | decode ms   encode ms   decode mpps   encode mpps   size kb
       | libpng:     148.4      3995.5         55.88          2.08
       | 12223         stbi:       161.0      1858.3         51.50
       | 4.46     19199         qoi:         60.8        95.6
       | 136.49         86.78     12868
       | 
       | I'm interested if there's a standardized benchmark used in the
       | academic literature this could be tested against
       | 
       | In any case the results are impressive enough that this will 100%
       | be used in projects I work on!
       | 
       | Many thanks to the author for their work <3
        
         | nightcracker wrote:
         | When I download that PNG file it is 10785 KB. If I re-encode it
         | using FLIF 0.4 (to my knowledge still the best off-the-shelf
         | lossless image compression) it is 7368 KB. That's 57% of the
         | size of qoi, quite a lot smaller. You pay for it in
         | encode/decode time though.
        
           | fastball wrote:
           | Interestingly I actually managed to shave 400kb off of that
           | PNG by just running it through some PNG optimizers. Not great
           | savings but were still some to be had!
        
           | minus7 wrote:
           | FLIF has been superseded[1] by JPEG XL, which, in my very
           | limited testing, performed better (speed and compression-
           | wise; libjxl) than FLIF.
           | 
           | Interesting though that FLIF 0.4 was released 3 days ago. I
           | haven't checked out that new release (and won't), but the
           | previous one wasn't great code and was a pain to use (ended
           | up using its CLI instead of the lib). We ended up going back
           | to PNG because of interoperability and speed. I haven't
           | looked at libjxl's API yet.
           | 
           | Edit: 8605KB in 11.5s with libjxl, so it's not actually
           | better than FLIF 0.4 (7368KB in 23.9s)
           | 
           | [1]: https://flif.info/#update
        
             | nightcracker wrote:
             | Yes, I am aware it has been 'superseded' but in my
             | experience for lossless compression it's still usually
             | better.
        
         | gfody wrote:
         | pngcrush -brute  10355kb       cjxl -q 100 -e 9  8413kb
        
           | haxiomic wrote:
           | Can be made smaller for sure, but if I'm say working with
           | 10,000 images in my app and decoding all the time, a 3-4x
           | speedup in the decode time can be a major saving for ~ 10-40%
           | increase in file size with a lib I can just drop in my
           | project without effort
        
       | mattashii wrote:
       | Nice stuff! But I was confused by the following:
       | QOI_RUN8 {             u8 tag  :  3;   // b010         ...
       | QOI_RUN16 {             u8 tag  :  3;   // b011
       | 
       | I think there might be a bug lurking in those tag values.
       | 
       | EDIT: Wow, I'm blind. Those 3s were bits consumed by the tag, not
       | tag values...
        
         | formerly_proven wrote:
         | The :3 means how wide the field is. Instructions are byte-
         | aligned, but bit-packed.
        
       | lam0x86 wrote:
       | Looks like PCX on steroids
        
       | j-pb wrote:
       | > To keep things O(n) when encoding, there's only one lookup into
       | this array. The lookup position is determined by a "hash" of the
       | rgba value (really just r^g^b^a).
       | 
       | Is a bit imprecise. The algorithm would still be O(n) even with a
       | linear search through the "seen pixel array", as it is bounded by
       | 64 length and therefore a constant factor that gets eaten by the
       | Big-O notation.
       | 
       | Because of this hash the algorythm seems to do something else
       | than first described though. Instead of "a running array of the
       | 64 pixels it previously encountered", it's actually the previous
       | 64 pixel _values_ previously encountered, which allows for much
       | bigger jumpback.
       | 
       | By deterministically computing this array during execution it
       | itself seems to act as a kind of compressed jumpback table,
       | allowing for approximated arbitrary jumpback with just 6 bit.
       | 
       | Quite impressive!
       | 
       | Edit: I think there might be an interesting tweak that could
       | increase the effectiveness of the hashing technique used.
       | 
       | If one were to walk and store the pixels in a Z-Curve order, the
       | runtime would stay the same but the locality of the color value
       | pool, might be increased.
        
         | tda wrote:
         | I was about to post the same idea about a locality preserving
         | pixel order. I was thinking about hilbert curves, never heard
         | of z-curves. Would be nice to see if/how that affects the
         | compression ratios. How surprising that a few simple principles
         | can give quite a good image format
        
           | mkl wrote:
           | https://en.wikipedia.org/wiki/Z-order_curve
           | 
           | Related to this method, it's sometimes used for texture
           | mapping on GPUs.
        
           | olaulaja wrote:
           | Based on some quick* testing using z-order instead of going
           | row by row ends up taking 1-10% less space. Most images were
           | on the lower end but I didn't run into any taking more space.
           | 
           | * Swapping to z-order requires about 5 lines of changes if
           | you ignore all the little edge cases like non power-of-two
           | image sizes. Images used in testing may or may not be
           | representative of anything.
        
           | j-pb wrote:
           | After making the edit, I saw that someone made the same
           | suggestion on twitter!
           | 
           | Hilbert curves have slightly better locality than Z-curves,
           | but Z-Curves are trivial to produce.
           | 
           | If you have a N dimensional space over unsigned integers,
           | then you can place each point on a one dimensional Z-Curve by
           | interleaving their bits into a new integer.
           | 
           | So for any point (X, Y) in the X-Y coordinates of an image
           | the Z-Curve coordinate is X_1,Y_1,X_2,Y_2...
           | 
           | Which can be achieved with a bunch of bit shifts and masks,
           | (or a bunch of wires if you happen to use it in an FPGA ^^')
        
             | formerly_proven wrote:
             | CPUs can do this quickly:                  __m128i src =
             | _mm_set_epi32(0, 0, x, y);        __m128i res =
             | _mm_clmulepi64_si128(src, src, 0);        uint64_t zeorder
             | = _mm_extract_epi32(res, 0) | (_mm_extract_epi32(res, 2) <<
             | 1);
             | 
             | Though a blocked iteration order should perform better due
             | to better locality.
        
               | j-pb wrote:
               | Thanks for the pointer!
               | 
               | If anybody else want to know how this works:
               | 
               | https://stackoverflow.com/questions/30539347/2d-morton-
               | code-...
               | 
               | https://en.wikipedia.org/wiki/Carry-less_product
        
             | user-the-name wrote:
             | Hilbert curves are not _that_ much more complex. Maybe ten
             | or so lines of code. Interleaving bits is already a pretty
             | complicated operations if you don 't have builtins for it.
        
               | a_e_k wrote:
               | I agree that de-interleaving is fairly complicated, but
               | interleaving can be done fairly easily with a lookup from
               | a table for each dimension and then combining them:
               | xbits[x] | ybits[y] | channelbits[c].
               | 
               | The nice thing about this is that just by changing the
               | tables, you can get things like z-curve, tiles, row-
               | major, or column-major pixel orders (or even
               | combinations), plus swizzled, interleaved, or planar
               | layouts for the color channels.
        
             | mzs wrote:
             | Do you have a link to discussions about QOI on twitter?
        
               | j-pb wrote:
               | https://twitter.com/WAHa_06x36/status/1463456861978599426
               | ?s=...
        
             | phkahler wrote:
             | I added Z-order curve to a ray tracer many years ago.
             | Rather than using nested loops over X and Y pixel
             | coordinates I used a single loop and de-interleaved the
             | count to get X and Y. My thought was that this would
             | increase the cache hit rate due to the locality of the
             | pixels. It worked and the speedup was on the order of 10
             | percent. Because images are rarely power of 2 or even the
             | same in both dimensions I opted to make a list of 32x32
             | blocks and rendered the blocks in Z-order as well as the
             | pixels within (which still helped vs just tiling).
             | 
             | Locality is a good thing to exploit.
        
         | ErikCorry wrote:
         | I wonder how much worse the compression would be if you just
         | had the last 64 colors, but didn't update when using the run
         | length byte codes. Seems like it would be close to the same,
         | and the decoder wouldn't need to hash the pixels.
        
         | sltkr wrote:
         | > Because of this hash the algorythm seems to do something else
         | than first described though. Instead of "a running array of the
         | 64 pixels it previously encountered", it's actually the
         | previous 64 pixel _values_ previously encountered, which allows
         | for much bigger jumpback.
         | 
         | It's somewhere in between. Because these values are hashed into
         | single slots, assuming that hash codes are random, the
         | probability that a value is still in the table after k
         | intervening values is pow(63/64, k).
         | 
         | That means the table is unlikely to contain all 64 values, but
         | it will also retain some older values. Since older values are
         | less likely to be useful in the future the encoding performance
         | is likely somewhat worse than if the algorithm used strictly
         | the last 64 distinct values.
         | 
         | This means that the algorithm is a bit sensitive to hash
         | collisions, which seem particularly likely for primary colors
         | (i.e. with each of the components at value 0 or 255):
         | - Slot 0: black, ALL SHADES OF magenta (x ^ 0 ^ x = 0), yellow
         | (etc.) and cyan.       - Slot 15: white (255 ^ 255 ^ 255 =
         | 255), red (255 ^ 0 ^ 0 = 255), green (etc.), blue.
         | 
         | Except black and white, these are unlikely to occur often in
         | photographic images but could well occur in bitmaps with a
         | handcrafted palette, or when the RGB values are converted from
         | a system with a lower color space. For example, the 16 color
         | CGA palette maps to just 3 different slots.
        
           | j-pb wrote:
           | Yeah, one should probably add a randomisations step for each
           | channel for good measure.
           | 
           | With random byte->byte lookup tables those costs should be
           | hidden by the cache.
           | 
           | so hash = HR[r] ^ HG[g] ^ HB[b] ^ HA[a].
        
           | ball_of_lint wrote:
           | In an image with very limited colors you would have long runs
           | of the same color and it would compress well from just the
           | run-length encoding alone.
           | 
           | Also, it's non-obvious that older values will be less useful.
           | Going through the pixels in order like this doesn't preserve
           | locality well, so older values may actually be closer to a
           | given pixel than more recent ones.
        
             | MauranKilom wrote:
             | > In an image with very limited colors you would have long
             | runs of the same color and it would compress well from just
             | the run-length encoding alone.
             | 
             | ...unless those images are dithered. Which may be a
             | relatively common thing to do in small color spaces!
        
       | kortex wrote:
       | Super cool work! I was also mucking around with codecs last week.
       | I got some surprisingly good results from delta encoding each
       | channel separately and then zstandard compression. Dirt simple,
       | crazy fast.
       | 
       | One trap to be aware of when profiling image codecs is not
       | knowing the provenance of the test images. Many images will have
       | gone through one or more quantization steps. This can lead to
       | impressive compression ratios, but if you compare with the
       | results on raw, unquantized imagery, it's less of a slam dunk.
       | 
       | That said, PNG is a _really_ low bar to clear, both in terms of
       | speed and compression ratio. Even really image-naive compression
       | algorithms tend to be within a factor of 2 in resulting
       | compressed size. You tend to get diminishing returns. 10% vs 20%
       | is twice as efficient, but starting with a 10MB file, 1 vs 2 MB
       | file sizes isn 't a huge delta. (yes I know there are domains
       | where this matters, don't @ me).
        
         | simcop2387 wrote:
         | > Super cool work! I was also mucking around with codecs last
         | week. I got some surprisingly good results from delta encoding
         | each channel separately and then zstandard compression. Dirt
         | simple, crazy fast.
         | 
         | My understanding is that this is one of the techniques that png
         | uses (though not zstd) for doing it's xompression too. Along
         | with I think some different strides down the image and some
         | other stuff but I don't fully understand everything it tries.
         | That's what optipng and friends play with for parameters to
         | find the best settings for decomposing the image for
         | compression.
         | 
         | If you look at what jpeg and similar codecs do they nearly take
         | a DC offset out of the image because that difference then
         | greatly reduces the range of values you have to encode which
         | means that you need fewer bits to do so. Combine that with
         | range or arithmatic coding and you can get decent efficency for
         | very little work. Then with lzw, zstd, etc. You can get lots of
         | other patterns taken out and compressed. Lossy codecs try to
         | take out the stuff we won't see with our eyes so that the set
         | of values to be encoded need fewer bits and it'll also make
         | more patterns appear that can be exploited too
        
           | kortex wrote:
           | Yeah, my technique was equivalent to filter mode 0 (the only
           | mode), filter type 1 (sub pixel to the left). literally
           | array.ravel().diff(prepend=0) in numpy.
           | 
           | I think part of the reason png is so slow is that it
           | empirically tests to find the best one.
           | 
           | > Compression is further improved by choosing filter types
           | adaptively on a line-by-line basis.
           | 
           | I'd be curious to unpack a png and see the statistics of how
           | much each kind is used.
        
       | janandonly wrote:
       | Waawh, one would think (certainly, I personally thought) that all
       | the _simple_ compression algorithms would have been discovered
       | and build by now...
        
         | rowanG077 wrote:
         | Interesting I have the opposite view. From intuition I would
         | think there is an unbounded set of simple compression
         | algorithms. You can probably make much more effective simple
         | compression algorithms by making basic assumptions about the
         | type of image.
        
           | EmilioMartinez wrote:
           | This is patently so. Taken to absurd proportions, you can
           | compress a specific rendition of the Mona Lisa with just one
           | bit. 1 means Mona, and 0 is undefined behaviour.
        
             | mijoharas wrote:
             | Are we about to talk about Kolmolgorov complexity? you're
             | damn right that we're about to talk about Kolmolgorov
             | complexity! :)
             | 
             | [0] https://en.wikipedia.org/wiki/Kolmogorov_complexity
        
             | FooBarBizBazz wrote:
             | Taking this a step further: 0 means that all subsequent
             | bits contain an arbitrary PNG. Now your algorithm encodes
             | _any_ image, and still does Mona Lisa in one bit.
        
               | selcuka wrote:
               | It's also almost as effective as PNG (plus one bit) for
               | non-Mona Lisa images.
        
         | vintermann wrote:
         | Well, this is a combination of simple approaches which have all
         | been used before, though put together in a nice way. And it's
         | not benchmarked against modern formats like lossless webp or
         | jpeg-xl.
        
       | lifthrasiir wrote:
       | tl;dr:                 00xxxxxx          - copy (x+1)-th last
       | EXPLICITLY ENCODED pixel (i.e. ignoring repeats)       010xxxxx
       | - repeat the last pixel x+1 times       011xxxxx xxxxxxxx -
       | repeat the last pixel x+33 times       10rrggbb          - copy
       | the last pixel and adjust RGB by (r-1, g-1, b-1)       110rrrrr
       | ggggbbbb - copy the last pixel and adjust RGB by (r-15, g-7, b-7)
       | 1110rrrr rgggggbb bbbaaaaa                         - copy the
       | last pixel and adjust RGBA by (r-15, g-15, b-15, a-15)
       | 1111RGBA [rrrrrrrr] [gggggggg] [bbbbbbbb] [aaaaaaaa]
       | - copy the last pixel and replace RGBA if each bit flag is set
       | 
       | So it is essentially the PNG filter limited by Sub, but offers
       | better delta coding and a larger history window. My guess is that
       | it might not work well if the image has a strong vertical
       | gradient (which would need the vertical context), but
       | nevertheless it's pretty impressive that this simple coding is
       | not as inefficient at all.
        
         | bonzini wrote:
         | > 00xxxxxx - copy (x+1)-th last EXPLICITLY ENCODED pixel (i.e.
         | ignoring repeats)-
         | 
         | Not exactly. It's "copy the last pixel color for which
         | (r^g^b^a)&63 == x".
        
           | lifthrasiir wrote:
           | Oh you are right, I only took a cursory look at the header
           | file and thought the hashing is only used to speed up the
           | lookup.
        
         | miohtama wrote:
         | This is so neat. Reminds me the recent LZ4 post who manually
         | hand tweaking the encoding one could still create a decent
         | compression boosts after all these decades.
        
         | userbinator wrote:
         | I see it as a variant of LZ specialised to images, so its
         | performance should not be too surprising. Running an
         | uncompressed bitmap through a general-purpose compression
         | algorithm tends to yield similar results to PNG.
        
         | chrisdew wrote:
         | The current encoding is complete, so no future extensions are
         | possible.
         | 
         | A fix would be to change:                 110rrrrr ggggbbbb -
         | copy the last pixel and adjust RGB by (r-15, g-7, b-7)
         | 
         | to:                 1100rrrr ggggbbbb - copy the last pixel and
         | adjust RGB by (r-7, g-7, b-7)
         | 
         | This leaves:                 1101???? ...
         | 
         | available for future extensions.
        
           | HanClinto wrote:
           | I really like this.
        
         | mkl wrote:
         | You might be able to do better compression by traversing the
         | pixels in Hilbert [1] order or Z [2] order, to take into
         | account vertical neighbours as well as horizontal. It might
         | cost time or space though, as you couldn't stream the pixels
         | through it anymore - you'd need them all in memory.
         | 
         | [1] https://en.wikipedia.org/wiki/Hilbert_curve
         | 
         | [2] https://en.wikipedia.org/wiki/Z-order_curve
        
           | EdSchouten wrote:
           | The nice thing is if M >= B^2 (i.e., total memory is large
           | enough to fit a square region of the image, where each
           | row/column of the square fits a full block/page of memory),
           | you can transform from row/column order to Hilbert/Z-order
           | without needing to do more I/Os.
           | 
           | So you can't do such a conversion in a streaming fashion, but
           | there is no need to load all data in memory either.
        
             | andai wrote:
             | See also: Intel's guide to looping over smaller chunks of
             | 2D arrays for better cache utilization:
             | 
             | https://www.intel.com/content/www/us/en/developer/articles/
             | t...
        
               | EdSchouten wrote:
               | Yeah, those images explain it pretty well. There is one
               | slight difference, though.
               | 
               | Because the Hilbert/Z-order curves are defined
               | recursively, algorithms for converting between the curve
               | and row/column order are "cache-oblivious", in that they
               | don't need to take an explicit block size. You write it
               | once, and it performs well on any system regardless the
               | CPU cache size and/or page size.
        
           | andai wrote:
           | I wonder if the pixel data were transformed to/from a
           | different order, before and after the
           | compression/decompression, if that would speed things up
           | without introducing too much slowdown of its own?
        
           | nightcracker wrote:
           | You could still stream the pixels, you'd just have to stream
           | them in Z order. This may seem pedantic, but this applies to
           | any mismatch between the order in which you store pixels and
           | the order in which the format stores pixels.
           | 
           | E.g. some file formats may be row-major, others column-major,
           | some block-based (like JPEG) or a fancy interleaving scheme
           | (like Adam7). Some might store the channels interleaved,
           | others separate the channels. If any of these choices doesn't
           | match the desired output format, it breaks streaming.
        
             | akx wrote:
             | And formats like BMP are stored upside down...
        
           | adrianN wrote:
           | Iteration order through arrays has a big impact on
           | performance due to cache effects. I'd guess that you'd lose a
           | lot of performance by a complicated iteration scheme.
        
       | [deleted]
        
       | sbrki wrote:
       | Very nice.
       | 
       | Regarding the comments about complexity of existing formats - I
       | agree, with the exception of PNG. In terms of complexity, QOI is
       | very similar to PNG filtering phases, i.e. when you strip primary
       | compression phase (Huffman coding) from PNG.
       | 
       | Also, I admire how the author combined a lot of common and cool
       | tricks into QOI:
       | 
       | - Block tags form a sort of a prefix code, making it easy to
       | differentiate block types with differing sizes
       | 
       | - Look-back array is used (common trick from LZW family)
       | 
       | - Delta-coding is used when storing pixel diffs
       | 
       | - In the end, if nothing else works, full pixel value is written
       | 
       | Also, the fact that everything is byte aligned is very
       | appreciated.
       | 
       | This is a breath of fresh air and really impressive how good
       | performance (both processing and size reduction) this gets with
       | so little (very readable!) C.
        
         | superjan wrote:
         | Traditional compression research usually prioritized
         | compression ratio, so more images would fit on your floppy
         | disks and were faster to download over 28.8k modems. It usually
         | means increased complexity.
         | 
         | Png has it's own story as it's #1 design goal was to avoid
         | techniques that were patented at the time. So they combined a
         | simple set of prepasses with zip/gzip compression. There were
         | better techniques known at the time, but the idea was that
         | combining simple, and known patent-free solutions was the
         | safest way to achieve a free format.
        
           | BlueTemplar wrote:
           | The most important is probably the difference between lossy
           | and lossless compression...
           | 
           | (The GP-mentioned Bink is lossy ! Then he also for some
           | reason uses a picture tending towards photo realistic which
           | does NOT play well with lossless !)
        
             | superjan wrote:
             | Yeah there's a difference in techniques used for lossy vs
             | lossless. But my point was that in compression research,
             | runtime performance was much less important than
             | compression performance, that was no different for lossy
             | compression.
        
               | jijji wrote:
               | speaking of runtime performance if you look at lzma for
               | instance it's horrible on runtime performance but it
               | produces some of the best compression ratio of all of
               | them
        
         | khoobid_shoma wrote:
         | Yeah, really nice and efficient C programming with a great
         | idea!
        
       | KaiserPro wrote:
       | > The basic ideas for video compression in MPEG are ingenious,
       | even more so for 1993, but the resulting file format is an
       | abomination.
       | 
       | can confirm.
       | 
       | Although the worst part is the documentation is in engineering
       | english, so its quite hard to understand. MPEG2 is actually not
       | as bad as you'd imagine. MPEG4 is.
        
       | pantulis wrote:
       | This is as cool as it gets.
       | 
       | I have a honest question about the source code, a lot of time
       | since I coded anything meaningful in C, but I remember the header
       | files did not have that much code on it, while here I see most of
       | the code is in the header file. Why is this the case? Inline
       | compilation? What are the advantages?
        
         | CornCobs wrote:
         | Single header libraries are hugely more ergonomic and easy to
         | use. Just get the file, #include it and you're done! Most C
         | libraries on GitHub prefer this mode of distribution now
        
           | phkahler wrote:
           | >> Single header libraries are hugely more ergonomic and easy
           | to use. Just get the file, #include it and you're done! Most
           | C libraries on GitHub prefer this mode of distribution now
           | 
           | I find that odd. How is that significantly better than
           | dropping _2_ files into a project and #including the header?
           | A good build system should just compile all the C or C++
           | files in a given place, so nothing needs to be changed other
           | than dropping the files in and #include where called from.
        
             | hn_acc_2 wrote:
             | The difference will be painfully clear once you try to walk
             | a junior developer through installing and linking new files
             | to a large project in Visual Studio, vs "copy this .h file
             | here and include it".
             | 
             | Whether or not a "good build system" should handle it, the
             | fact that single-file libraries are much preferred these
             | days should demonstrate most people don't have such a build
             | system
        
             | rytill wrote:
             | Why use two files instead of one?
        
               | bluedino wrote:
               | What if you want to use the functions in other C files?
               | What if you want to compile them all separately?
        
               | hn_acc_2 wrote:
               | 1) Include it wherever it's used. The convention is to
               | use the preprocessor[1] so it only appears once in the
               | final binary
               | 
               | 2) Most developers just want to compile to a single
               | binary. Any developer who for some reason needs
               | separately compiled objects should be able to quickly
               | achieve that from a single-file library
               | 
               | [1] https://github.com/phoboslab/qoi/blob/324c2243b2fb35f
               | 741a88d...
        
         | clownpenis_fart wrote:
         | a lot of people seem allergic to using the linker these days.
         | 
         | As a result you have to inline all their code, which is usually
         | ridden with warnings so you can't compile anymore with sane
         | options like -Werror.
         | 
         | It's not like using an actual library is really hard in the age
         | of cmake and meson.
        
         | lifthrasiir wrote:
         | There is a strong tradition of single-file C/C++ libraries [1]
         | because it has been difficult to integrate larger C/C++
         | libraries into the codebase.
         | 
         | [1] https://github.com/p-ranav/awesome-hpp
        
           | flohofwoe wrote:
           | There's an important difference between STB-style 'single
           | file libraries', and C++ 'stdlib-style' header-only-
           | libraries: STB-style libraries put the implementation code
           | into a separate section inside an "#ifdef IMPLEMENTATION"
           | block, while most C++ header-only-libraries use inline code
           | (and thus pay a higher compilation cost each time this header
           | is included).
           | 
           | STB-style libraries skip the implementation already in the
           | preprocessor and compile the implementation only once for the
           | whole project.
        
             | sltkr wrote:
             | So what's the compelling reason to ship these as a single
             | source file, instead of using the more logical and standard
             | solution of a single header and a single source file?
             | 
             | Unlike C++ header-only libraries, C libraries like this
             | still need their implementation included exactly once.
             | 
             | I understand the benefits of keeping the implementation in
             | a single source file (simplifies integrating in existing
             | build systems, interprocedural optimizations, etc.) but
             | combining the source and header files too makes them
             | awkward to use, with little benefit beyond being able to
             | claim "our implementation is only 1 file!" which is about
             | as impressive as saying "our implementation is only 1
             | line!" if you've just deleted all the whitespace.
        
               | lmm wrote:
               | > So what's the compelling reason to ship these as a
               | single source file, instead of using the more logical and
               | standard solution of a single header and a single source
               | file?
               | 
               | C doesn't have a packaging format or dependency manager,
               | so having to keep track of two files would be pretty
               | cumbersome.
        
               | flohofwoe wrote:
               | In general, STB-style headers simplify C/C++ build system
               | shenanigans.
               | 
               | E.g. one benefit is that you can configure the
               | implementation via preprocessor defines without the
               | defines leaking into the build system. Just add the
               | config defines in front of the implementation include.
               | 
               | It's also trivial to write "extension headers" which need
               | to directly peek into the (otherwise private)
               | implementation of the base library. Just include the
               | extension header implementation after the base library
               | implementation.
               | 
               | PS: but also see: https://github.com/nothings/stb#why-
               | single-file-headers
        
               | sltkr wrote:
               | > In general, STB-style headers simplify C/C++ build
               | system shenanigans.
               | 
               | Too vague. How is it better, concretely?
               | 
               | > E.g. one benefit is that you can configure the
               | implementation via preprocessor defines without the
               | defines leaking into the build system
               | 
               | That's not a benefit. How is this:
               | #define LIB_IMPLEMENTATION        #define LIB_KNOB 123
               | #include "lib.h"
               | 
               | Any better than the two-file version:
               | #define LIB_KNOB 123        #include "lib.c"
               | 
               | ?
               | 
               | > It's also trivial to write "extension headers" which
               | need to directly peek into the (otherwise private)
               | implementation of the base library. Just include the
               | extension header implementation after the base library
               | implementation.
               | 
               | I'm not sure what this means. Can you give an example? Is
               | this being used in the above library?
               | 
               | > PS: but also see: https://github.com/nothings/stb#why-
               | single-file-headers
               | 
               | OK, that only offers this explanation:
               | 
               | > Why not two files, one a header and one an
               | implementation? The difference between 10 files and 9
               | files is not a big deal, but the difference between 2
               | files and 1 file is a big deal. You don't need to zip or
               | tar the files up, you don't have to remember to attach
               | two files, etc.
               | 
               | This is kind of nonsense to me. The difference between 2
               | files and 1 file is barely relevant, and doesn't weigh
               | against the awkwardness of having no clear distinction
               | between declarations and definitions, and having to pull
               | in definitions by setting a magic preprocessor header.
        
               | flohofwoe wrote:
               | > Any better than the two-file version:         >
               | #define LIB_KNOB 123         >    #include "lib.c"
               | 
               | See, now you have 3 files, the .h/.c file pair of the
               | library, and your own implementation source file with the
               | configuration which includes a .c file. How is this
               | better than having two files?
               | 
               | > Too vague. How is it better, concretely?
               | 
               | I can only assume from that question that you haven't had
               | the "pleasure" to work much with C/C++ build systems yet,
               | at least when it comes to integrating 3rd party
               | libraries.
               | 
               | Re. extension headers:                   #define
               | IMPLEMENTATION         #include "base_library.h"
               | #include "extension_library.h"
               | 
               | Since the implementation of the base library is in the
               | same compilation unit as the extension library, the
               | extension library has direct access to the private
               | implementation details of the base library without having
               | to add a separate 'private but actually public API'.
               | 
               | > no clear distinction between declarations and
               | definitions
               | 
               | STB-style libraries have this clear distinction, C++
               | style header-only libraries don't.
        
               | sltkr wrote:
               | > See, now you have 3 files, the .h/.c file pair of the
               | library, and your own implementation source file with the
               | configuration which includes a .c file.
               | 
               | The third source file is what the library authors
               | suggested; I'd just create a seperate Makefile target for
               | the implementation, which is then linked into the binary.
               | (Yes, that's an extra target, but that costs very little
               | and allows you to avoid unnecessary rebuilds.)
               | 
               | Why don't you give me a concrete counterexample where the
               | single-header-setup makes things _simpler_? If it 's so
               | obviously beneficial it shouldn't be so hard to come up
               | with a concrete example.
               | 
               | > #define IMPLEMENTATION > #include "base_library.h" >
               | #include "extension_library.h"
               | 
               | Thanks for explaining, but that doesn't really motivate
               | having a single file, since you could just as easily do:
               | #include "base_library.c"       #include
               | "extension_library.c"
               | 
               | And you've saved a line (and removed the need for obscure
               | preprocessor macros).
               | 
               | > ... without having to add a separate 'private but
               | actually public API'.
               | 
               | Personally, I'd say this is exactly the right way to
               | model this, but even if you disagree: it sounds like this
               | simplifies things mainly for the library/extension
               | authors, and not the library users, because if the
               | extension author forward-declares the implementation
               | details they depend on, there is no additional file to
               | manage for the users either.
        
               | wizzwizz4 wrote:
               | > _The difference between 2 files and 1 file is barely
               | relevant,_
               | 
               | To you. To me doing Rust stuff, it's been quite helpful
               | to have single-file C programs; it makes it a lot easier
               | to compile and link things.
        
         | ur-whale wrote:
         | Packing the entire library in a header file is a somewhat
         | recent and much welcome trend.
         | 
         | See for example:
         | 
         | https://github.com/nothings/stb
         | 
         | https://github.com/nothings/single_file_libs
        
           | mkl wrote:
           | I wouldn't call it recent. Boost has been doing it since it
           | was started in the late 1990s.
        
             | ChrisMarshallNY wrote:
             | The original C++ templates used to be entirely contained in
             | the headers. I assume that is no longer the case, but it
             | has been a long time, since I wrote C++.
             | 
             | When I was writing template SDKs, the headers were pretty
             | much the entirety of the library.
        
             | flohofwoe wrote:
             | Boost or C++ stdlib headers are different than STB-style
             | single-file-libraries though in that the implementation
             | code is inline and needs to be parsed each time the header
             | is included.
        
             | tommiegannert wrote:
             | Boost did it because C++ template definitions must be in
             | headers.
             | 
             | That seems unrelated to this idea that C code using it is
             | equally reasonable. That's a developer choice.
        
       | willvarfar wrote:
       | Yummy :)
       | 
       | Makes me want to experiment too.
       | 
       | Is there some easy corpus with nice categorization of types
       | (drawing, painting, photo, comic, clip art etc) so we can all
       | easily compare?
        
       | stillicidious wrote:
       | > I want to preface this article by saying that I have no idea
       | what I'm doing
       | 
       | Instant hire!
        
       | dbrgn wrote:
       | Looks nice!
       | 
       | Did you try fuzzing your implementation (e.g. with AFL++ or
       | libfuzzer)? Codecs and especially decoders written in C that
       | handle untrusted binary data are quite worthwhile targets for
       | fuzzing.
        
       | SavantIdiot wrote:
       | What I like about this approach is that it requires zero
       | differential calculus. People forgot that Radix2 DCTs come from a
       | century's worth of advanced math in the S domain. This approach
       | is applied-computer-science based, I can get my head around that
       | a lot better than a Laplace/Z transform.
       | 
       | But what shocks me is how did absolutely no-one apply such a
       | naive RLE in the 30 years of internet image formats?
       | 
       | Is this a case of "man reinvents wheel again because why do
       | research" or is this exceptional?
        
         | a_e_k wrote:
         | > But what shocks me is how did absolutely no-one apply such a
         | naive RLE in the 30 years of internet image formats?
         | 
         | Off the top of my head, both PCX [0] and TGA [1] use similarly
         | naive RLE as their compression method.
         | 
         | And while they also offer fancier compression methods, BMP [2],
         | TIFF [3], and OpenEXR [4] all offer RLE modes as well.
         | 
         | [0] https://en.wikipedia.org/wiki/PCX
         | 
         | [1] https://en.wikipedia.org/wiki/Truevision_TGA
         | 
         | [2] https://en.wikipedia.org/wiki/BMP_file_format
         | 
         | [3] https://en.wikipedia.org/wiki/TIFF
         | 
         | [4] https://en.wikipedia.org/wiki/OpenEXR
        
         | Trixter wrote:
         | RLE on runs of deltas is in very many compression schemes; it's
         | a common technique (probably so common that many authors don't
         | deem it worth mentioning). Lossless video codecs in particular
         | (huffyuv, lagarith, utcodec, etc.) have had this going back 20
         | years at least.
        
       | wongarsu wrote:
       | Is it just me, or is the compression algorithm not really less
       | than the one in PNG? I will grant the author that if you don't
       | need features like support for color spaces, different bit depths
       | or interlacing you can skip the entire container format. But for
       | reference this is the core part of png:
       | 
       | For each scanline write one byte for the chosen filter method,
       | then $width pixels. There are five filter types. In type 0, write
       | the pixels unmodified. In type 1, write the difference to the
       | previous pixel. In type 2, write the difference to the above
       | pixel. In type 3, write the difference to the average of the
       | above and left pixel. In type 4, write the difference to the
       | Paeth predictor (a simple function calculated from the above,
       | left and left above pixel). Then compress everything with
       | deflate.
       | 
       | Obviously by modern standards that isn't performant, not least
       | because deflate sucks by modern standards. But it's simple, can
       | be implemented in a couple lines of C (if you pull in deflate as
       | dependency), and if you don't do exhaustive searches for the best
       | compression it can certainly run in O(n)
        
         | gpvos wrote:
         | The 64-pixel lookback makes it a bit similar to LZ-style
         | compression like Deflate, but the latter is not O(n) because it
         | can recognize multi-pixel sequences. QOI is just a slightly
         | beefed up run-length encoding.
        
       | ErikCorry wrote:
       | I like the small size of the decoder state - not much more than
       | 256 bytes. I haven't found a way to decode GIF with less than
       | about 16k of mutable state. PNG is similar though you can use the
       | decoded image if it's available.
        
       | benlivengood wrote:
       | I wonder if the author tried making pixel diffs capable of using
       | lookback. They'd be one byte longer but would possibly happen a
       | lot more often than full pixel values. It's also reminiscent of
       | the delta blocks from MPEG. If anything I would trade the 5-bit
       | run for delta-from-lookup on a whim (the cases where the ratio
       | between short runs vs. exact lookback is extreme seems like it
       | might only happen for pinstripe patterns). I should probably
       | implement the change and test it before posting...
        
         | adgjlsfhk1 wrote:
         | The hard part here is figuring out which pixel to diff to. You
         | would almost certainly have to look at all 64 pixels which
         | would be a lot slower. That said, you can probably get
         | noticably better compression from it.
        
           | benlivengood wrote:
           | I think O(log(M)) where M is the number of remembered pixels.
           | Index them by their (R,G,B,A) quad and do a range lookup
           | sequentially by pixel, since any valid pixels would be within
           | the same range as earlier potential matches.
        
       | SkeuomorphicBee wrote:
       | Really cool encoding, it is a bit incredible that such simple
       | rules can archive such good results.
       | 
       | It is so interesting that if the image have few colors, method #2
       | alone (An index into a previously seen pixel) encodes the whole
       | image into an indexed colour space (with the hash table becoming
       | the pallet).
       | 
       | Ive seen already some comments recommending adding some vertical
       | locality by tiling, Hilbert order, or Z order. I have yet another
       | different vertical locality suggestion to try:
       | 
       | While encoding/decoding keep a ring buffer the size of one full
       | line of the image, storing each encoded/decoded bit there. Then
       | add a variation of method #3 (The difference to the previous
       | pixel) which instead encodes the difference to the pixel above
       | the current (the last pixel from the ring buffer). Maybe, due to
       | the limited "tag" space it would not be worth it to implement it
       | for DIFF8, but only for DIFF16 (taking one bit from red to the
       | additional tag bit) and DIFF24 (I'm not sure which bit I would
       | steal there).
       | 
       | Edit: Also, when thinking about video encoding in the future you
       | may want to keep a ring buffer fitting the full last frame of
       | video. And then temporal locality is probably even more
       | important, so then you may want to change the tags to encode more
       | commands relating to the pixel from the previous frame.
        
         | tda wrote:
         | nice idea, I think it is somewhat more in line with the keep it
         | simple option, as I e.g. have no idea what a hilbert curve for
         | an arbitrary size rectangle looks like. And you still get
         | compression for x and y directions (though no run length
         | encoding the vertical, perhaps a further addition).
         | 
         | I quite like this approach of defining a few compression
         | methods, see which one works best, and apply that one. I used a
         | similar strategy for time series compression where I would take
         | a chunk of data, apply z few different compression algorithms
         | on it (RLE, delta RLE, etc) and just keep which ever one was
         | smallest. Of course here it is done on a per pixels basis, and
         | I used arbitrary size chunks. The latter is trivial to
         | parallelize. Same could be applied here though, split the image
         | in half and compress both parts independently
        
         | gfody wrote:
         | having done my own experiments with indexed pallets I suspect
         | method #2 is responsible for most of the compression here. most
         | photographs only have half a million distinct colors and it
         | drops considerably as you focus in on an area. if you're
         | willing to relax the single-pass requirement, partitioning the
         | whole image by pallet similarly to how one encodes high-color
         | images into multi-framed gifs can yield surprisingly good
         | compression.
        
       | a_e_k wrote:
       | QOI_DIFF16 {             u8 tag  :  3;   // b110             u8
       | dr   :  5;   // 5-bit   red channel difference: -15..16
       | u8 dg   :  4;   // 4-bit green channel difference:  -7.. 8
       | u8 db   :  4;   // 4-bit  blue channel difference:  -7.. 8
       | }
       | 
       | I'd think that the extra bit ought to be given to green since
       | that tracks most closely with luminance. That would make this
       | somewhat similar to the RGB565 layout for 16-bit pixels
       | (https://en.wikipedia.org/wiki/High_color#16-bit_high_color).
        
       | Ono-Sendai wrote:
       | Very cool. Typo: "I barely understand how Huffman Coding and DTC
       | works" I guess you mean DCT? (discrete cosine transform)
       | 
       | One other thought: To make it so that compression and
       | decompression can be multithreaded, you might want to 'reset' the
       | stream every N rows. (i.e. break any RLE runs, start from a
       | colour literal). This would allow a bunch of threads to start at
       | different places in the image, in parallel. There would be some
       | small cost to compression ratio but might be worth it.
        
         | phoboslab wrote:
         | Thanks, fixed!
         | 
         | I'll probably investigate "resetting" the stream to allow for
         | multithreaded en-/decode when I try to roll this into a video
         | codec.
        
           | mkl wrote:
           | Even better might be breaking it into tiles, so you get some
           | vertical locality happening as well as horizontal.
           | 
           | It would be interesting to know which strategy is used for
           | each pixel. Have you tried making maps of that with four
           | different colours? Particularly interesting would be how much
           | the cache is used and also how often it replaces a colour
           | that it could have used later. Maybe there's a better colour
           | hash. BTW you might want to change "(really just r^g^b^a)" to
           | "(really just (r^g^b^a)%64)".
        
             | iainmerrick wrote:
             | I really like the trick of caching previously-seen pixels
             | by value rather than position. Very nifty!
             | 
             | I was thinking this would combine well with progressive
             | rendering (i.e. store pixels in mipmap-style order) so the
             | cache is warmed up with spread of pixels across the image
             | rather than just scanline order. That doesn't make it
             | parallelizable in the way tiling does, though, hmm.
             | 
             | Another tweak that would probably help is having the file
             | specify a rotation (or even a full mapping table) for each
             | component of the hash, so a clever encoder can pick values
             | that minimise collisions. (A fast encoder can just use all-
             | zeroes to get the same behaviour as before.)
        
           | Trixter wrote:
           | Before you start rolling this into a video codec, please
           | consult the sources of UT Codec, which has been state of the
           | art in this space for many years. Or, at least, use UT Codec
           | as a direct basis for comparison, to see if you are beating
           | it in terms of compression/decompression speed.
           | 
           | Modern lossless video codecs don't care that much about
           | saving space, since you're burning gigabytes per minute
           | anyway; the key is that the compression and decompression are
           | as transparent as possible, to reduce the bandwidth to/from
           | the storage medium. A good lossless codec can be used to
           | scrub through and edit on a video editor's timeline, so
           | decomp performance is what's most important.
           | 
           | Also, any such lossless video codec is going to need more
           | than 8 bits per component; most real-world footage coming off
           | of phones and cameras is 10-bit Rec.2020. If it is too
           | difficult to position QOI for real-world sources, you can
           | certainly keep it 8-bit and market it as an animation codec
           | or similar; just keep it in mind.
        
       | londons_explore wrote:
       | This is a great project, but personally I feel that there are
       | lots of uses for uncompressed images (ie. random access, ready to
       | render straight away), and lots of uses for well compressed
       | lossless, but little use for 'slightly compressed' images.
       | 
       | The normal argument for 'slightly compressed' images is that they
       | can be decompressed with very few CPU cycles, but the reality is
       | that with modern platforms lossless image en/decoding is done in
       | hardware and therefore pretty much 'free', as long as you pick
       | lossless h264, lossless webp or lossless hevc. All of those
       | things will also give much smaller file sizes.
       | 
       | In the few places where hardware decode isn't a thing (eg. the
       | memory-constrained 8 bit microcontroller decoding the boot screen
       | for your IoT toaster), there usually isn't a need for
       | losslessness.
        
         | iainmerrick wrote:
         | "Slightly compressed" is heavily used in 3D graphics -- see
         | ETC2, DXT, etc.
         | 
         | It's useful because the size savings apply _in memory_ ,
         | because decompression is so cheap it can be done on the fly.
         | 
         | It has to be lossy because you want a guaranteed fixed-rate
         | compression ratio (eg 4x) and there's no way to do that
         | losslessly.
        
       | makapuf wrote:
       | for a good simple and fast image compressor look at DXT (decoded
       | natively by graphics cards). No benchmarks for encoder
       | performance though.
        
       | st_goliath wrote:
       | Out of curiosity, I modified the benchmark program to simply toss
       | the pixel data into lz4/lz4hc for comparison (patch file:
       | https://paste.debian.net/1220663/).
       | 
       | I'm actually quite impressed how the resulting size is a little
       | bit smaller than lz4hc (actually not even that far away from
       | libpng), while the encoding speed is quite close to lz4, despite
       | seemingly not having as many hacks and tweaks under the hood to
       | get there. So the encoder is IMO doing amazingly well actually.
       | 
       | However, the _decoding_ speed of liblz4 is off by roughly an
       | order of magnitude. But again, liblz4 probably benefits from
       | decades of tweaking  & tuning to get there. It would be
       | interesting how much performance could be gotten out of the qoi
       | decoder.
       | 
       | Here are the _totals_ I get for the  "textures" benchmark
       | samples:                         decode ms   encode ms   decode
       | mpps   encode mpps   size kb       libpng:       2.2        32.4
       | 58.67          3.94       160       stbi:         2.1        17.0
       | 61.50          7.49       228       qoi:          0.7         0.7
       | 191.50        170.54       181       lz4:          0.1
       | 0.6       1226.06        206.40       258       lz4hc:        0.1
       | 70.9       1029.26          1.80       200
       | 
       | And for the "wallpaper" samples. The spreads seem to be a bit
       | more in favor of qoi here:                         decode ms
       | encode ms   decode mpps   encode mpps   size kb       libpng:
       | 131.9      2287.1         66.63          3.84      8648
       | stbi:       147.5      1177.1         59.55          7.46
       | 12468       qoi:         56.3        56.5        156.13
       | 155.60      9981       lz4:         14.3        53.1
       | 614.53        165.50     18019       lz4hc:       13.9
       | 1901.7        630.94          4.62     12699
        
         | injinj wrote:
         | I believe libpng default zlib compression level is 6. libpng
         | may use CRC (or adler) checksums, which account for around 30%
         | of the decode time. In my experience with zlib, the higher
         | compression levels have faster decode times. My reasoning at
         | the time was (early '00s) that higher compression tends to code
         | longer runs of copying of the dictionary. More time in the copy
         | loop rather than the input and decode loop.
        
         | zenogais wrote:
         | Like the author said this is completely unoptimized. The
         | natural next step in optimization might be to profile and then
         | SIMD optimize the slow bits in compression and decompression.
         | This would likely produce a significant speedup and may even
         | bridge the gap with lz4.
        
         | Trixter wrote:
         | > liblz4 probably benefits from decades of tweaking & tuning to
         | get there
         | 
         | Decade, singular. LZ4 was proper around 2011. But the real
         | reason it's faster is because its not doing nearly as much work
         | as QOI. LZ4 is just about as asymmetric as you can get.
        
         | willvarfar wrote:
         | I believe (from quick code inspection) that the symmetry in
         | encode/decode performance for QOI is because it has to generate
         | the hash-table on decode too.
         | 
         | Normal fast encoders use some kind of hash table or search to
         | find matches, but encode the offset of the match in the source
         | rather than in the table. QOI is encoding the offset into the
         | table, which gives much shorter offsets but means the decoder
         | has to maintain the table too.
         | 
         | (The slow PPM compressors etc do maintain the same table in
         | both encoder and decoder, and have symmetrical encode/decode
         | performance too. See http://www.mattmahoney.net/dc/text.html)
         | 
         | Its not really in the spirit of of QOI to add the complexity,
         | but I'd imagine allowing the encoder to specify how big the
         | hash table was would be only a small tweak, and encoding
         | literal blocks instead of pixel-by-pixel will improve handling
         | input that QOI can't compress better.
         | 
         | I would be curious to know if planar helps or hinders. Perhaps
         | even see what QOI makes of YUV etc. And I want to see 'heat
         | maps' of encoded images, showing how cheap and expensive parts
         | of them are, and which block type gets used. Yeah, going a bit
         | beyond the spirit of QOI :D
         | 
         | And from looking at the code I'm a bit confused how 3-channel
         | is handled for literals because the alpha still looks like it
         | gets encoded. Would have to walk through the code to understand
         | that a bit better. Is it going to deref off the end of the
         | source RGB image? Etc.
         | 
         | (People interested in compression may be interested in the go-
         | to forum for talking about it
         | https://encode.su/threads/3753-QOI-(Quite-OK-Image-
         | format)-l...)
        
         | eisbaw wrote:
         | What about qoi followed by lz4?
        
       | renonce wrote:
       | So it looks like the algorithm diffs neighbouring pixels, assume
       | that the difference is likely to be small, and encodes the
       | difference in 1, 2, 3 or 4 bytes. It does this without complex
       | things like huffman trees and therefore got a major improvement
       | in performance while approaching the compression ratio of zlib.
       | That's interesting.
        
       | yboris wrote:
       | Obligatory comment about the new file format that's going to
       | replace jpegs: _JPEG XL_
       | 
       | https://jpegxl.info/
       | 
       | It also has lossless setting (in addition to being better at
       | compression when you want a lossy format).
        
       | eisbaw wrote:
       | I'd like to see this in xpra.
       | 
       | Does anybody have an idea if QOI+lz4 would improve compression
       | ratio further?
        
       | wwalexander wrote:
       | Are there compression algorithms that take longer than O(n)? My
       | impression is that even two-pass encoding is O(n) with a large
       | constant factor in front of it, but I've never actually thought
       | this much about it.
        
         | selcuka wrote:
         | Some LZ variants such as LZ77 and LZW are O(n), but LZH, for
         | example, is O(n*logn) because of Huffman encoding. I agree that
         | O(n) is not revolutionary, but the simplicity of this algorithm
         | makes it faster to execute.
        
           | eutectic wrote:
           | Isn't the log factor only in the alphabet size?
        
       | bob1029 wrote:
       | This is very impressive work.
       | 
       | One thing I did recognize after my last trip into the rabbit hole
       | - JPEG can be the fastest format by orders of magnitude if you
       | are using an appropriate encoder. The project mentions 20x-50x
       | speedups over PNG which are commendable, but once you throw
       | LibJpegTurbo into the mix (it does use SIMD), you will find a
       | substantially different equation on your hands. I can do 1080p
       | 4:4:4 JPEG encodes in ~3ms on my workstation today.
       | 
       | I am not saying you should always use jpeg (some people like to
       | keep all their original pixels around), but if you want to go
       | really _really_ fast you should consider it. The latency is so
       | low with this approach that I have been looking at the
       | feasibility of building a streaming gaming platform around it.
       | Current status of that effort is somewhere between  "holy shit it
       | actually works" and "maybe?"
        
         | bno1 wrote:
         | Jpeg is not losses though
        
       | dexen wrote:
       | I am in awe with how the post neatly interleaves the why and the
       | how. Having recently implemented L4 decompressor (itself a rather
       | reasonable and "lightweight" standard), I like the QOI even
       | better. Hoping to see the video decoder author mentions as
       | possibility!
        
       ___________________________________________________________________
       (page generated 2021-11-24 23:00 UTC)