[HN Gopher] QOI: Lossless Image Compression in O(n) Time
___________________________________________________________________
QOI: Lossless Image Compression in O(n) Time
Author : Ragnarork
Score : 790 points
Date : 2021-11-24 10:51 UTC (12 hours ago)
(HTM) web link (phoboslab.org)
(TXT) w3m dump (phoboslab.org)
| adgjlsfhk1 wrote:
| Why not separate the format for RGB vs RGBA based on a header?
| Since images will never have both RGB and RGBA pixels, you would
| be able to get an extra few percent encoding by using different
| (and conflicting) tags for the two types of images. For example
| you could have QOI_COLOR_RGB which would use a 5 byte tag and
| QOI_DIFF24_RBG could store 7 bit differences for 2 colors and 6
| bit differences for the other color. For RGBA, you could do
| something similar for the two tags that are RGB specific.
| HanClinto wrote:
| Or even based on a #define if one didn't want to check a bit
| flag every time you need to encode a full pixel. Would not be
| difficult to have two versions of the library -- one for RGB,
| and one for RGBA.
| tonmoy wrote:
| I am not familiar with image encodings, but isn't the idea of
| complexity relative? For example getting rid of Hoffman encoding
| and byte aligning everything may be less complex for computers,
| but when making energy and area efficient hardware, we may not
| care that much about some of these things?
| selcuka wrote:
| For a small number of operations, sure, but when you need to
| compress a million, or a billion images, the performance
| optimisations quickly add up.
| khitchdee wrote:
| This algo would work well for computer generated synthetic
| images, not as well for scanned images.
| ripdog wrote:
| Which is why the author is positioning it as an alternative to
| PNG, not JPEG.
| jiggawatts wrote:
| I know you don't like "design by committee" and overly complex
| formats. Also, the compression algorithm itself is somewhat
| independent of the "container format" that it's used with.
|
| However, if I may make a suggestion: If you _do_ end up rolling
| your own container format for any reason, always include a way to
| store the colour space of the image!
|
| Treating RGB images as arrays of bytes without further tags is
| like treating char* as "text" without specifying the code page.
| It _could_ be UTF8. Or Latin-1. Or something else. You don 't
| know. Other people don't know. There's often no way to tell.
|
| Similarly with colour spaces: sRGB, Display P3, and Adobe RGB are
| _just_ close enough to be easily confused, but end up looking
| subtly wrong if mixed up.
| fabrixxm wrote:
| It has data size in header:
| https://github.com/phoboslab/qoi/blob/master/qoi.h#L77-L82 Just
| put exif info after data :P
| sam0x17 wrote:
| I've never understood the need for color profiles on images
| themselves. On monitors and displays, sure, those vary, but why
| allow people to specify a color profile on an image, thus
| making images no longer a ground truth for pixel value? It
| seems like the main use case is dealing with images that
| already have color profiles. What is the point?
| kaetemi wrote:
| Because your pixels need to transform from the image color
| profile to the display color profile.
|
| By default this all falls back to sRGB, which is not the
| "ground truth pixel value" you may be assuming it is.
| sam0x17 wrote:
| Right, which begs the question why on earth would there be
| more than one image color profile. The image color profile
| should just be implied and the same everywhere.
| kaetemi wrote:
| For correct editing operations, and correct display.
| Sadly, it's not pedantry, it's reality.
|
| The 0.5 value is not half the brightness of 1.0 in sRGB.
| It merely means half the power output to your CRT.
|
| If you want correct editing operations, you need to work
| in linear RGB.
|
| It gets more fun. What is white? The color of your
| backlight? The color of white paper under your particular
| desk light?
|
| And what is black? Is it pure darkness, or the black ink
| of your printer? Does your printer represent pure black,
| or dark gray?
|
| We're stuck with images stored in the display profile of
| old CRTs by default, because that was the most practical
| option at the time.
| sam0x17 wrote:
| > We're stuck with images stored in the display profile
| of old CRTs by default, because that was the most
| practical option at the time.
|
| The analogue in photography is photographs lose color
| clarity and fade as they age. Why should I care if this
| is the case in images as display technology evolves when
| this is already a problem with every physical medium?
|
| > For correct editing operations, and correct display.
| Sadly, it's not pedantry, it's reality.
|
| I've been designing images and graphics for the web and
| sometimes for print off and on as a regular part of my
| various job functions since ~2009 and I have yet to see a
| situation where a color profile on an image isn't a huge
| pain and something that needs to be stripped away to get
| seemingly correct results.
|
| Back to my point, I still don't see the value of having a
| color profile applied to an image. Images exist in the
| ether as platonic ideals, and we try to approximate them
| with our various display technologies. Why complicate the
| matter by also having multiple "flavors" of the platonic
| ideal itself? When I say (255, 0, 0), I expect the
| display to show me its best approximation of red. When I
| say (255, 255, 255), I expect the display to show me
| something as close to white as possible (and at the
| brightest possible setting). When I say (0, 0, 0), I
| expect the display to show me something that looks as
| close to black as possible. It's up to the display
| technology to decide whether this means just turn off
| that pixel on the screen or disengage the backlight or do
| whatever, but at the end of the day it's just trying to
| approximate black.
|
| This is complicated enough, why do I need an image that
| will look good on only one kind of display and will have
| it's greens look pink on other displays. Isn't having a
| color profile for the display enough?
| kaetemi wrote:
| I get what you mean, but... the color profile of the
| image is the profile of the display where the image can
| be displayed without adjustment. Think of it as the
| display profile of the guy that sent you the image. The
| magic math transforms the image to your display profile.
| That means it will look exactly the same on both
| displays. If they both have a correct display profile.
|
| If your green is going pink, then either your profiles
| are wrong, or your software is broken. Maybe it really is
| pink, and it just looks green for you, because you're
| ignoring color profiles.
|
| But the fact is, most software is broken, and you should
| store images with the sRGB profile.
|
| And also, you can actually calibrate consumer hardware,
| so that you can scan a photo, and reprint it. And the
| scan, display, and print will look _exactly_ the same.
| (It 's not the case by default, because consumer printers
| do what you do, stretch or fit the color space to the
| printer's. The Vivid and Natural profiles in the driver,
| respectively. This is a good default for documents, not
| for professional photography.)
| romwell wrote:
| >That means it will look exactly the same on both
| displays. If they both have a correct display profile
|
| We can continue believing that Santa exists, or we can
| accept that effectively nobody has correct color
| profiles, and doesn't care either.
|
| It's nice metadata you have there, would be a shame if I
| applied _night mode_ to it at 6PM.
|
| > also, you can actually calibrate consumer hardware
|
| ...with professional hardware that costs more than the
| one you have at hand.
|
| Again, pretty much everyone will just tweak their
| monitor/printer settings until they get results that look
| OK.
|
| >display and print will look exactly the same
|
| Until you turn off the lights. Or until you turn _on_ the
| lights (what color temperature is you light?). Or until
| the wind blows, moving that cloud over the sun, changing
| light temperature.
|
| Or -- wait for it -- actually, that's all, while you've
| been waiting the sun has set.
|
| The point being, all we can shoot for is FF0000 being
| what would be "red" for most, 00FF00 being "green", and
| 0000FF being "blue" -- and then accept the "fade" of the
| digital image from one screen to another as a fact of
| life.
|
| So, learn how to stop worrying and love unspecified RGB
| colorpsaces. Magenta only exists in your brain anyway [1]
|
| _Side note_ : did you ever think about how your recorded
| music isn't adjusted for the frequency response of the
| loudspeakers, and half the people will listen to it with
| the "BASS MEGABOOST" feature on anyway?
|
| That's why the art of _mastering_ exists. It 's accepting
| the reality, and putting work into making it work with
| _uncalibrated_ , crappy hardware -- as well as hi-fi
| gear.
|
| _PS_ : have fun calibrating your vision to make sure
| that you don't see green where I see red
|
| [1]https://medium.com/swlh/magenta-the-color-that-doesnt-
| exist-...
| cornstalks wrote:
| > _When I say (255, 0, 0), I expect the display to show
| me its best approximation of red._
|
| Which red? There isn't a single, true "red". And
| different materials produce different reds (my work's
| Sony BVM-X300's reddest red is going to look way
| different than your monitor's red). Not all displays use
| only RGB color primaries, too. For example, at Dolby's
| office in San Francisco they have a projector in their
| theater that uses 5 (IIRC) color primaries, not 3.
| 6-primary projectors exist. And other non-RGB displays.
|
| > _When I say (255, 255, 255), I expect the display to
| show me something as close to white as possible_
|
| Which white? D50? D55? D60? D65? D67? Something else? And
| yes, these different white points (and many others) are
| actually used in practice.
|
| > _(and at the brightest possible setting)._
|
| 100 nits looks way, way different than 4,000 nits. Some
| monitors can do 10,000 nits.
|
| > _When I say (0, 0, 0), I expect the display to show me
| something that looks as close to black as possible. It 's
| up to the display technology to decide whether this means
| just turn off that pixel on the screen or disengage the
| backlight or do whatever, but at the end of the day it's
| just trying to approximate black._
|
| Which black? This might sound dumb, because we can agree
| that there is an objective "absolute black" (i.e. zero
| photons). But when an artist creates an image, the
| monitor they use has some version of black. If you don't
| account for that, the image may be distorted. Blacks can
| be crushed, for example.
|
| An absolute color space exists. It's called XYZ. We could
| use it. Some image formats support it.
| formerly_proven wrote:
| > An absolute color space exists. It's called XYZ. We
| could use it. Some image formats support it.
|
| XYZ is really annoying to work in though. ACES is a
| pragmatic solution here, quite literally:
| https://blog.frame.io/wp-content/uploads/2019/09/ACES-
| APO.jp...
|
| Okay, fair enough two out of three primaries are
| imaginary colors, but nobody said you have to use the
| whole color space when your processing pipeline is meant
| to be using 32 bit floats. Delivery formats might want to
| be using a more real color space.
| pixel_fcker wrote:
| ACES (AP0) is an archive format for when you want to be
| absolutely sure you're not clipping any colours. As a
| working space it's terrible and AP1 should be preferred,
| either as ACEScg or ACEScc
| Dylan16807 wrote:
| AP1 is almost the same as rec.2020? I'm not sure what you
| mean by 'working space' but if preserving all colors was
| a primary or even secondary goal I definitely wouldn't
| cut myself down that far.
| spc476 wrote:
| And the red in my left eye isn't the same as the red in
| my right eye. Yes, when the lighting conditions are _just
| right_ , I see different hues out of each eye [1]. I have
| to wonder how one would (or should?) correct for that in
| an image file format.
|
| [1] I think from a stupid thing I did as a kid.
| dekhn wrote:
| took me a while to notice that I perceive color and light
| intensity differently in my two eyes. I think this is
| actually pretty natural (IE, it happens commonly?).
| Either way, I can also see polarization (haidinger's
| brush) which confused me a bunch when I was trying to
| explain what I saw and everybody else thought I was
| crazy).
| cornstalks wrote:
| I think the best we can do with file-level metadata is
| account for only objective metrics. Subjective
| differences are best handled outside of the file
| metadata. The user can tune their display settings to
| match their preferences. This allows for correcting for
| physiological differences and stylistic preferences. If
| all files had correct metadata (for objective metrics),
| it would be a lot easier for an end user to tune the
| system to their liking because their end results would be
| consistent with their preferences.
| kaetemi wrote:
| The sRGB color #ffffff should never mean "the brightest
| white on this monitor", unless you're using an average
| CRT.
|
| Just imagine you're using an HDR display, where the
| brightest white is as bright as the sun.
| makapuf wrote:
| OK but then when will you use that "bright as sun" color?
| If not, why provide them? If so, what color will you use
| ?
| formerly_proven wrote:
| sRGB is quite literally defined as "the average 90s CRT
| in an office environment", i.e. full-white sRGB on a HDR
| display should be around 100 nits or so in those
| reference conditions (i.e. display set to "reasonable"
| brightness).
| adgjlsfhk1 wrote:
| The color you will use isn't in SRGB, but is in other
| color spaces such as Rec2020.
| kaetemi wrote:
| VR displays. Probably the color of the sun in linear RGB
| color space for rendering purpose, then converted to the
| display color space.
| formerly_proven wrote:
| You're assigning a color profile to the image in your
| model - the color profile of the display. Hence no color
| conversion needed, as source and target match.
|
| What "red" and "green" are has changed quite dramatically
| with different display technologies. A display designed
| to meet Rec.2020 can show colors that other displays
| literally cannot produce and the deviation between the
| primaries is so big that everything looks like garbage if
| you don't do a color space conversion. Take some sRGB
| content and display it on a DCI P3 display. Looks like
| shit. Humans look like crabs.
|
| > On monitors and displays, sure, those vary, but why
| allow people to specify a color profile on an image
|
| The sole reason why we have device profiles defining
| device color spaces is so that we can convert from the
| image's color space to the device's color space. If
| images don't have a profile assigned to them, you don't
| need a device profile.
|
| So you either have both image and device profile, or
| neither. Just one doesn't make sense.
| Sharlin wrote:
| What does "255" mean? If you just take it to mean "max
| saturation that your display can output" then it's going
| to be perceived as a different color depending on the
| viewer's display. That is undesirable. And due to
| historical reasons (the capabilities of CRT monitors) the
| meaning of "max saturation" was de facto standardized to
| what we now call sRGB. If you want to encode a greater
| range of colors than sRGB then you need to be able to say
| that hey, this "255" is more saturated than what sRGB
| considers "255" and should not be displayed on sRGB
| without conversion.
| AlotOfReading wrote:
| Because display technology can't reproduce the full set
| of colors we see and and moreover, we have a finite
| number of bits to encode the value of any pixel. A color
| space/profile is both a target for display manufacturers
| to produce and a way of standardizing the trade off
| between bit depth and quantization error.
| floatingatoll wrote:
| Unfortunately, that's not the case in pragmatic reality.
|
| When I say #ff0000, do I mean "sRGB red", as a TV would
| display in pure red phosphors back in the 1970s, or do I
| mean "1000K" / 700nm red (daylight is 6500K), which can
| only be reproduced by specialized displays?
|
| Most people from the time before Apple made Display P3
| wide color displays standard in all their products -- so,
| most HN commenters -- believe that #ff0000 just means
| "only red, no green or blue" -- but all web browsers
| currently are left to invent their own answer for what
| #ff0000 means, and they do not all invent the same
| answer. Yet.
|
| So it is with images.
|
| In the beginning times, people made sixteen-color images
| for NTSC monitors, and then other people learned you
| could display synthetic colors that don't exist by
| mangling the image bitstream to the monitor, and those
| sixteen colors were hard-coded but varied by like +/-5%
| per device due to acceptable manufacturing tolerances, so
| they were internally consistent anyways.
|
| And so we end up, today, trying to solve color profiles
| for file formats that specify colors as RGB hex values --
| which are, by definition, restricted to sRGB and thus
| wildly insufficient. But if we plan too far, we get file
| formats that are totally ridiculous - TIFF comes to mind
| - that can represent anything under the sun, at the cost
| of having a 1:1 mapping of "pixels" to "bytes" or worse.
|
| You may also find the history of the EXR format relevant
| here, as it supports a functionally infinite amount of
| dynamic range in images, and there are definitely good
| niche use cases for it -- but pragmatically, it's
| ridiculously overkill for normal everyday purposes. You
| could argue that everyone should use EXR in L _a_ b, but
| then someone would suggest Oklab (since it makes better
| perceptual compromises) or HSLuv (what a great name), or
| you could try storing raw CIE 1931 chromaticity
| coordinates, which has been deprecated by CIE 170-2 to
| address serious flaws in the 1931 system.
|
| Tying this back to engineering, there are often very good
| reasons for designing a processor or a byte store to be
| RISC or CISC, big or little endian, and we simply cannot
| predict what will work _most pragmatically_ for all cases
| universally ahead of time, any more than the metric
| system was invented early enough in time to prevent miles
| per hour from being a unit of measure.
| kortex wrote:
| No single color space is complete. Each has trade-offs. I
| would actually say that color profiles help get _closer_ to
| ground truth, that is, rendering intent.
| demute wrote:
| CIE XYZ is complete.
| lstamour wrote:
| Isn't that kind of like saying Real numbers are complete?
| Yes, it's true, but we don't have an infinite level of
| precision for each colour. So we have to, practically
| speaking, limit how many colours can be specified to some
| reasonable (and arbitrary) number, e.g. 10 million or 1
| billion or more (in terms of number of colours in a
| colour space) and representation in digital bits. Also,
| since I conflated the two a bit, it is often helpful to
| restrict what the max range of a colour is so that you
| can get more useful data out of the same precision. It's
| kind of like creating a dictionary to compress data, you
| can express more useful values with less data by choosing
| to display a subset of colours and by picking where the
| white point is for your display. These optimizations used
| to mean more than they do today, perhaps, but it can
| still take a lot of effort to transfer 10-bit or 12-bit
| raw pixel data at 4K resolution, 120 times a second...
| And it's worth pointing out that while 8-bit is all you
| might need as precision for perceptually lossless HDR
| (with dithering), that kind of range isn't good enough
| for colour editing work. It's like starting a photo edit
| from the JPEG rather than the RAW file.
| adgjlsfhk1 wrote:
| 8 bit just objectively isn't enough for hdr. 8 bits is
| few enough that you can pretty easily see banding on a
| simple color gradient in sdr. Dithering helps a lot, but
| once you get to HDR 10 bits is pretty much the minimum.
| Dylan16807 wrote:
| > Isn't that kind of like saying Real numbers are
| complete?
|
| It's more like saying the range from 0..2 is complete so
| don't use an encoding that caps out at 0.8 or 1.1
|
| The difference between all visible colors and a more
| constrained color space is only a factor of two or three.
| Less than half a bit per channel.
| kortex wrote:
| CIE XYZ has imaginary colors, though.
|
| I guess when I said "no color space is complete" I meant
| "no color space is faithfully bijective to nominative
| human visual sensory response." Because such a thing is
| basically impossible (or insanely nonlinear and
| unwieldy).
|
| Also these color systems all or almost all assume a
| "norminal" (to steal a word from Everyday Astronaut,
| normal+nominal) average human vision system. If your
| system isn't norminal, average, human, or even visual
| (visible light), then your color space is an incomplete
| map of the territory.
|
| So for any given array of pixel channel, there is either
| an implicit assumption or explicit color space of what
| those values "mean".
| Dylan16807 wrote:
| Why does it matter if it's bijective?
| kortex wrote:
| So GP's question was basically "why do we need color
| profile?" and the short answer is "to know what images
| _should_ look like ".
|
| In order to know what an image _should_ look like, a
| "perfect" or "complete" representation, at minimum the
| real colors must be injective to the color space (every
| value has at most 1 color). You could argue that the
| color space needn't be injective (values that have no
| color), which is probably fine if your image is static
| and you never manipulate it. As soon as you manipulate
| values of your image in your color space, you run the
| risk of falling outside the color domain.
|
| But really to unpack this whole thread, my essential
| point is that color spaces are about engineering
| tradeoffs. sRGB can't reproduce many colors. CIE 1931
| represents all color but has imaginary colors.
|
| Ergo, I contend there is no "complete" color space.
| contravariant wrote:
| If you agree with their model of human vision anyway.
| makapuf wrote:
| Using one given color profile is part of the trade-off.
| dahart wrote:
| The point is to use bits effectively and not waste them, by
| accounting for what _usually_ happens on most displays and in
| our eyes & brains. If we don't account for it, then (for
| example) a 1-bit difference between two bright colors means
| something very different from a 1-bit difference between two
| dark colors.
|
| So what is "ground truth" exactly? We need to know what units
| we're talking about; is it lumens, or "tristimulus" values
| (what we want our eyes to see), or LCD monitor voltages, or
| something else? First we have to agree on what ground truth
| means.
|
| A big reason for common formats like PNG and JPEG to need
| color profiles is because 8 bits per channel is not enough
| color resolution unless you store the colors non-linearly. 8
| bits is barely enough (or some would say _almost_ enough) to
| represent low-dynamic-range images on TVs and monitors even
| when you have the colors gamma encoded or use a perceptual
| color space.
|
| The problem with storing "ground truth" values for pixels in
| 8-bits-per-channel formats is that many experts would
| probably say this implies a linear color space. And a linear
| color space encoded in 8 bits leaves noticeable gaps between
| colors, such that you can see color banding if you try to
| render a smooth gradient. You lose up to 2 bits of color
| precision in some regions, notably the dark colors.
|
| So, in order to not have color banding, we correct the colors
| before storing them by trying to make the colors have even
| steps between any 1-bit change in the spectrum. The color
| profile is just a record of what was done to correct the
| colors, so someone else can un-correct them as needed.
|
| You might think we could do away with color profiles for
| high-dynamic-range images, but the non-linearity of our
| vision and our displays means that linear encodings are still
| inefficient with their bits even if you have 32 bits per
| channel.
| mlhales wrote:
| A color space and color profile are not the same. The point
| is knowing what a byte represents in a byte array. Is it red,
| hu, saturation, alpha, brightness etc?
| romwell wrote:
| The format in question specifies this though.
| amelius wrote:
| Yes, also store the DPI.
|
| In general, allow arbitrary key-value pairs to be stored.
| makapuf wrote:
| If you want any interop you'd better standardize some keys,
| allowing for extensions. Store those in key,len,data records
| and you have iff/ riff formats ... used by .avi and .wav
| formats.
| mkl wrote:
| > Yes, also store the DPI.
|
| I don't think that's a good idea. Most images, like photos
| and screenshots, don't have physical dimensions. Some image
| tools like to put in bogus default DPI values that are
| guaranteed to be wrong, and it can be a bit of pain.
| amelius wrote:
| How else would you reproduce images at exactly the right
| size? Without DPI, the format would be useless for many
| scientific purposes.
|
| But I get your worries. As a compromise, DPI could be
| optional.
| mkl wrote:
| Yes, optional is fine.
| kortex wrote:
| Since we are talking container formats, I would recommend
| something in the IFF/RIFF family (which includes "true" Four-
| byte-type IFFs like RIFF, WAV, WEBP, and variants like PNG,
| MOV, MP4). They are about as simple as you can get as a
| container format. It has its limitations but it's basically the
| simplest "boxed" format you can implement - Type-Length-Value.
|
| I would absolutely recommend basing on some sort of container
| format (vs "raw" data structures), it makes it really easy to
| add functionality without breaking backwards compatibility.
|
| Basically all container formats follow this idea of "collection
| of boxes" pattern. The IFF series are box-length based. Ogg
| uses capture patterns but no explicit box size (I think). The
| former means you have to know the box size ahead of time (or
| seek back and write it in), the latter is better for streaming,
| but harder to seek (without indexes).
| Asooka wrote:
| I work in a VFX adjacent space, so I have some pretty strong
| feelings about what data should be stored in the colorspace
| tags. The colorspace data should be an arbitrary string. I
| would also like to have "usage intent" tag for whether this is
| color or non-color data. You would think that the colorspace
| would tell you, but people do silly things like save specular
| maps as sRGB. For completeness I would want the following
| fields:
|
| - colorspace - arbitrary string, the colorspace of the color
| data on disk.
|
| - rgb_primaries - arbitrary string, indicates the physical
| meaning of the linear RGB values.
|
| - transfer_function - arbitrary string, indicates the function
| to use to linearise the values from disk into the RGB
| primaries.
|
| - intent - one of display, texture, data, or an arbitrary
| string. Display images get tone-mapped, textures do not, data
| only gets linearised, even if it has rgb_primaries attached,
| other values only have internal meaning to the concrete art
| pipeline.
|
| For the first three, rgb_primaries and transfer_function take
| precedence over colorspace. The reason I want them is because
| people do mix and match. You could say that you can put the
| same data in some string format in the colorspace tag, but then
| everyone will invent their own slightly different way of
| specifying it and interoperability will suffer. Best to just
| force the values to be unstructured.
|
| A lot of arbitrary strings, but that's how it is - there is
| really no standard list of colorspace names. These four fields
| can cover pretty much everything I have seen people do.
|
| Though currently a lot of people just tag the filename with a
| colorspace suffix and call it a day. OpenColorIO even has a
| function to directly get a colorspace from a filename.
| em3rgent0rdr wrote:
| That metadata could be stored in the filename to keep it out of
| the actual file.
| dylan604 wrote:
| No, the filename is the worst place to put metadata. Nobody
| can control what will happen to a file name.
| em3rgent0rdr wrote:
| well nobody can control what will happen to a file's
| content either if you want to be pedantic.
|
| If are worried about dumb users renaming it...I think most
| users know to not change things after a '.', so could just
| put the colorspace code there. And then could put another
| '.QOI' to end the filename.
|
| If want to keep cruft out of the file itself, then putting
| it in the filename works.
| cyber_kinetist wrote:
| I think the author's intending to use this only as a data
| packing format (ex. for app/game development) and not a
| general-purpose exchange format. If all you want to do is pack
| image/texture data for your game then I totally get why they
| assume RGBA8 storage and leave out metadata such as color space
| information.
| Const-me wrote:
| Videogames don't need color spaces, but they have different
| compression formats on GPUs, like S3TC, BC7, or ASTC. These
| things save not just storage, also VRAM bandwidth.
| zcw100 wrote:
| I wonder if color profiles will be important to images used
| for AI. Right now, as far as I can tell, people generally
| ignore details like that but I wonder if someday people are
| going to say, "oh wait, we've been completely ignoring color
| profiles. We could have been using that" or "we didn't
| realize how ignoring that messed things up"
| kortex wrote:
| I'm in the computer vision space and have looked into the
| effect of color spaces and encodings.
|
| People had basically exactly the epiphany you described.
| For example, there are several color models which try to be
| more lighting-invariant or (racially) color blind (this is
| a very similar problem, humans are all more or less
| "orange" hue-wise due to melanin's spectrum). TSL is one
| such space, there are others.
|
| https://en.m.wikipedia.org/wiki/TSL_color_space
| pornel wrote:
| I know games tend to assume color spaces from context, and do
| other hacky things like reusing alpha for height maps or RGB
| for vector coordinates.
|
| But this approach is a source of endless conflict with image
| processing tooling, because nobody except the game author
| knows what the channels mean. Game developers keep
| complaining about image processing tools doing anything other
| than bytes-in-bytes-out on that data (e.g. clearing "unused"
| alpha or treating RGB as human-perceptible colors), because
| it breaks the unstated assumptions.
| dv_dt wrote:
| Yup embed enough metadata so you can actually nicely
| automate/integrate doing the right thing with tooling.
| jameshart wrote:
| Right - this thread of people asking for colorspaces, dpi
| data, arbitrary metadata collections... all are exhibiting
| precisely the 'expert group' committee mindset that this
| format is supposed to avoid.
|
| If you want all that, there are formats galore that support
| it! The point is more: where are the formats for people who
| don't need all that? Here is one. Please don't try to ruin
| it.
| alkonaut wrote:
| You don't need to support a ton of complex stuff, but if an
| rgb image format _doesn't_ then it should just have a
| version marker and the spec can simply say "the rgb data
| represents sRGB colors".
|
| Just like e.g. some compilers don't support a lot of source
| file encodings but their non-support consists of clearly
| stating that a source file is UTF-8.
|
| Note that this requirement is only for an image exchange
| format. A raw-er image format used internally in e.g a game
| doesn't need it as badly.
|
| But in an exchange format _every_ user needs to know
| whether the image data is sRGB or Adobe. Because the reader
| didn't encode the data.
| giorgioz wrote:
| >I can almost picture the meeting of the Moving Picture Experts
| Group where some random suit demanded there to be a way to
| indicate a video stream is copyrighted. And thus, the copyright
| bit flag made its way into the standard and successfully STOPPED
| MOVIE PIRACY BEFORE IT EVEN BEGAN.
|
| This bit made me laugh out loud :D
| ur-whale wrote:
| > This bit made me laugh out loud :D
|
| Yeah, it's funny, until you actually go read the MPEG spec.
| documentation.
|
| Then any and all inklings of laughter will very quickly
| evaporate.
| h4kor wrote:
| What does it say about these bits?
|
| I'm currently tinkering around with mp3s and wondered how
| many "useless" bits are in each frame.
|
| I wonder if they are ever used to trace distributors of
| pirated music, similar to yellow dots in printers.
| BlueTemplar wrote:
| Indeed, however remember the mandatory "What colour are your
| bits ?" :
|
| https://ansuz.sooke.bc.ca/entry/23
| barrkel wrote:
| I rather wonder if things like this are used to encode a proof
| of intent.
|
| If something is marked as copyrighted, it may be trivial to
| remove the mark, but at least you had to remove the mark, and
| that's evidence of intent rather than ignorance.
| akx wrote:
| Or a piece of software in your toolchain has been unaware of
| the bit and inadvertently stripped it.
|
| Is that evidence of intent still?
| barrkel wrote:
| I'm not arguing in favour of it, I'm hypothesizing why it
| might be added.
|
| Your toolchain should set the bit last thing before
| distribution, I'd expect.
|
| If you're talking about a toolchain on the consumption side
| after distribution, well, consumers shouldn't be modifying
| content, is what I'd expect the position to be. No
| derivative works and all that.
| zosoworld wrote:
| I'm wondering how scalable would this be for hyperspectral
| images, with 25 channels for example. I'm guessing it's just
| repeating the processing for RGB channels but for 25?
| jameshart wrote:
| Very much the kind of expert group design-by-committee type of
| requirement that this simple compression scheme was designed
| not to handle.
|
| Since this scheme works primarily on exact cross-channel pixel
| matches, no. No it won't work well for 25 channel images.
|
| Which is a good thing. Most people are not trying to encode 25
| channel hyperspectral data and if they are they can use their
| own format designed well for that purpose (or a committee-
| derived format that handles it adequately).
|
| Or they can split the 25 channels across 7 4 channel images and
| encode them using this.
| zosoworld wrote:
| I did not try to imply that this should contemplate the other
| use case, but I was wondering whether the compression
| percentage would scale up with more channels that might not
| be related in the same way RGB are (as in luminance, for
| example).
|
| This effort really makes me want to try to do something
| similar for the other use case, and see how well it fares
| compared to the convoluted standards that are used for this
| topic.
|
| Thank you for your insight!
| blondin wrote:
| this is so satisfying to see. kudos to developer!
| rocqua wrote:
| I immediately thought of a use-case for (a modified version of)
| this: raw compression in cameras.
|
| Currently Sony has two options. Uncompressed raw that is huge, or
| lossy compressed raw that is half the size. They probably chose
| those options because throughput when shooting quickly matters a
| lot, so something like PNG would slow down shooting too much.
|
| I imagine a blazingly fast halfway decent compression algorithm
| would be great. It might even speedup shooting by needing to
| write fewer bytes. But in general people taking star pictures
| will love having compression without it being lossy.
|
| For reference an uncompressed raw file for 44MP is about 90MB.
| That starts filling up drive space rather fast.
| w-m wrote:
| I don't think this method fits camera raws particularly well.
| The real world is noisy, so it becomes very unlikely that you
| would ever hit the two most compressible methods of QOI, (1)
| pixel value repeat or (2) index into one of the previous 64
| pixels. QOI works really well for 8-bit screenshots, with lots
| of pixel-values repetitions. No such luck in noisy 14-bit real-
| world images.
|
| Also, this format was designed to be as simple as possible, and
| is byte-aligned, only handles 8-bit RGBA. You would need a
| different design for 12/14-bit RGB or RGBG (Bayer).
| carrolldunham wrote:
| Could a hash collision make it inexact? so not guaranteed
| lossless?
| [deleted]
| phoboslab wrote:
| No, the encoder checks if the color at the hashed value really
| matches before storing the QOI_INDEX. There's probably better
| ways to hash it, but this just seemed "good enough".
| formerly_proven wrote:
| This is pretty neat. A lot like the filtering done in PNG, just
| without slapping zlib on the result. As far as SIMD goes, I agree
| with the author and don't see much potential, as instructions are
| variable-length and most instructions encode deltas. It's
| probably quite close to optimal by just letting the CPU figure it
| out on the micro-level.
|
| If each line were encoded independently they could be en- and
| decoded in parallel (once you add a skip instruction to tell
| where the next line starts, this does preclude streaming of
| intra-line output though). The hit to compression ratio should be
| small as long as images are wide enough for the color buffer
| reset to not matter.
| pedrocr wrote:
| These kinds of formats have been somewhat explored by camera
| makers for their raw formats. Since the computing resources in
| the cameras can be quite limited there are a few formats that try
| to occupy this space of simplicity with still some tricks for
| compression. Samsung's various SRW encodings remind me most of
| this:
|
| https://github.com/pedrocr/rawloader/blob/a59bb78d156277781a...
| IshKebab wrote:
| Yeah I don't think there are any new ideas here. RLE,
| differential encoding and dictionary encoding have all existed
| for decades.
|
| The interesting thing is that it comes close to PNG.
| formerly_proven wrote:
| Lossy RAW is sometimes even simpler than that. E.g. Nikon's old
| compression was just remapping values in a log-like fashion,
| which reduced resolution in the brightest few stops[1] to save
| a few bits per sample.
|
| [1] Perception is pretty logarithmic as usual, so for a 14-bit
| readout, half the values represent the brightest stop and a
| little bit, the other half holds the other 12+ stops.
| pedrocr wrote:
| The simplest form is just to pack 10 or 12 bit values.
| There's all kinds of variations of that:
|
| https://github.com/pedrocr/rawloader/blob/a59bb78d156277781a.
| ..
|
| There are a few formats that use a curve and less bits. They
| do become lossy and doing dithering on decompress is useful
| to avoid banding.
|
| The Nikon one you mention was only used very early and is
| decoded by decode_8bit_wtable() in that file. It's just
| looking up the 8 bit value in the table and then adding some
| randomness to prevent the banding.
| nightcracker wrote:
| QOI_DIFF24 { u8 tag : 4; // b1110 u8
| dr : 5; // 5-bit red channel difference: -15..16
| u8 dg : 5; // 5-bit green channel difference: -15..16
| u8 db : 5; // 5-bit blue channel difference: -15..16
| u8 da : 5; // 5-bit alpha channel difference: -15..16
| }
|
| It bothers me more than it should that the 5-bit signed
| differences aren't -16..15 matching two's complement but -15..16
| instead.
| hoseja wrote:
| Yes, this is baffling. Wonder if the format could still be
| amended to fix this?
| selcuka wrote:
| Who knows? Maybe it's the result of a heuristic that achieves a
| slightly better compression ratio.
| phoboslab wrote:
| I should probably fix that. Is there a clever way to unpack a
| 5bit signed int other than treating it as unsigned and
| subtracting 16?
| benlivengood wrote:
| You'll have to shift the values around anyway, so:
|
| int intermediate = byte_or_word << N; // N chosen to put the
| five bits in the highest positions in the int.
|
| int value = intermediate >> bits_per_int - 5;
|
| Instead of masking rely on arithmetic shift right to keep the
| sign bit.
| adgjlsfhk1 wrote:
| That seems easiest.
| wizzwizz4 wrote:
| Especially since your compiler will make it a super magical
| optimised version.
| makapuf wrote:
| Would be interesting to try to use it as is but in a lossy way
| andrewla wrote:
| Very much this -- it will unfortunately have a horizontal bias
| if done naively, but I've been exploring the space of doing
| lossy preprocessing to improve image compression, but the
| layered heuristics in png make this difficult. This format is
| simple enough that we have a number of approximation heuristics
| -- find the closest color in the last 64, use a smaller delta,
| or preserve one or two channel colors in the next pixel.
| junon wrote:
| To answer the author's question: No, SIMD would not help with
| this.
|
| This is neat. I wonder if the author would be willing to write a
| Kaitai Struct definition for it.
|
| Something else interesting: QOI is 1,2,2 letters off from PNG.
| I'm quite certain this is an accident but it's interesting
| nonetheless.
| jiggawatts wrote:
| I suspect SIMD would help with the encoding. The lookup table
| is small enough to fit into 8 AVX2 registers, so instead of
| hashing, you could use direct lookup, which would improve
| compression ratio further (a little bit).
| junon wrote:
| Yes encoding might benefit of course, I was more considering
| decoding speed I suppose.
| jiggawatts wrote:
| There are some clever tricks that can be pulled with the
| latest instructions sets like AVX-512. The registers are
| huge and the instructions available are so varied that
| there are clever ways to use them in "off label" ways to
| implement lookup tables and bit-level parsers.
| Const-me wrote:
| > The lookup table is small enough to fit into 8 AVX2
| registers
|
| Indeed.
|
| > so instead of hashing, you could use direct lookup
|
| However, I don't think that part gonna work. See the code
| using that table:
| https://github.com/phoboslab/qoi/blob/master/qoi.h#L324-L328
|
| SIMD registers aren't indexable (at least not on AMD64), the
| register needs to be known to the compiler.
|
| Lanes within each register aren't indexable either. The
| insert and extract instructions are encoding lane index in
| the code. There're workarounds for this one, like abusing
| vpshufb or vpermd, but with that amount of overhead I doubt
| SIMD will deliver any profit at all.
| dahart wrote:
| This is pretty neat! The compression rates and encode/decode
| rates are impressive for such a simple rule set.
|
| > SIMD acceleration for QOI would also be cool but (from my very
| limited knowledge about some SIMD instructions on ARM), the
| format doesn't seem to be well suited for it. Maybe someone with
| a bit more experience can shed some light?
|
| I don't know about SIMD encoding, but for decoding at least, the
| core problem is to make some unit of the image (pixels,
| scanlines, blocks, etc) independently addressable (readable). The
| current format is complicated due to pixels of different sizes;
| there's no way to know where data for pixel 1000 starts without
| reading all preceding 999 pixels. You could build a prefix-sum
| array of pixel indices, but that will fully undermine your
| compression.
|
| One question might be how well does QOI work on independent scan
| lines or blocks? Is the compression still pretty good if the
| look-back window is limited in size and restarts every once in a
| while?
|
| To make a GPU capable format, the very first thing you would
| probably need to do is separate the tag bits from all the other
| data, so a thread could identify the pixel type by direct lookup
| into the tag buffer. It then needs to know where to go to find
| the remaining data, so you'd probably need at least one byte of
| data that could be used to reliably locate the rest of the data.
| I don't see an easy way to make this happen for separate pixels,
| but it might not be bad if the image is encoded in blocks or
| scanlines.
| mmastrac wrote:
| Wouldn't it make sense to transcode to a GPU-specific lossless,
| compressed format for rendering?
| dahart wrote:
| Yes, totally. Depends on your use case, but there certainly
| could be good reasons to keep images in compressed form even
| GPU RAM. It'd be ideal if decode is both random-access and
| predictable amounts of work per pixel. This format checks the
| predictable work box, but would need a little design to make
| it random access.
| nathias wrote:
| Very cool.
| javajosh wrote:
| Very cool - I hope browsers support this soon!
|
| Funny meta comment: when I read " _offering a 20x-50x speedup in
| compression and 3x-4x speedup in decompression_ " I caught myself
| reacting with disappointment to the "only" 3x-4x speed up. This
| is funny because 3x improvement is enormous; it's just small
| compared to 30x. There's a lesson somewhere in there about
| sharing stats together to achieve an emotional effect.
| xiphias2 wrote:
| It's trivial to port this to JS / WebAssembly without losing
| speed (using bytearrays) ..it's a nice afternoon project
| actually
| Animats wrote:
| Lossless avoids all those issues around human perception. Also,
| if you're storing game assets, some of what you're storing as
| images is not really imagery. It may be normals or height fields,
| which lossy image compression messes up.
|
| The code is interesting, in that it iterates over the output
| array, not the input. This helps prevent write buffer overflows.
| It has read buffer overflows, though. You can craft a compressed
| file which, when decompressed, will contain info from elsewhere
| in memory.
| ur-whale wrote:
| Wondering how well this does on noisy / grainy images since it
| relies so much on pixel similarity.
| EmilioMartinez wrote:
| Notably, pixel similarity along a particular order. If I
| understood correctly something like a vertical rainbow would
| compress significantly different than the same picture rotated.
| makapuf wrote:
| Noisy but around recurring or slighly different pixels would
| achieve a 1:4 compression rate
| WithinReason wrote:
| All lossless compression algos do badly on noisy images
| [deleted]
| WaitWaitWha wrote:
| I am going to be the fly in this ointment.
|
| I appreciate the premise to develop this, and the science that
| comes with it.
|
| I cannot tell if the author is writing in jest or serious when he
| is rude to the historical work that was done before him, yet
| relies on. It comes across as hubris with no floor to hold him
| up.
|
| Today's genius is often questioned and found to be a fool of
| tomorrow, and a bit of grace goes a long way.
| kortex wrote:
| Have you _seen_ codecs? I 've recently been exploring this
| space and there is not one corner of this universe that isn't a
| lurking Lovecraftian horror.
|
| Un-compressed formats like BMP are sane I guess but most of
| those are wrapped in crufty, sparsely documented container
| formats. Ogg is decent but even it takes quite the deep dive to
| even begin wrangling it.
| kasper93 wrote:
| Neat work.
|
| Though I wouldn't be myself if I didn't have few nitpicks ;)
|
| - You are mixing lossy and lossless compression in your rant. And
| you are mixing containers with formats. Thing about lossy
| compression is that it is by design stupidly slow and complex to
| produce smallest possible results which has best perceived
| quality. Compress once and trade time for both image and
| compression quality. Lossless is not that bad, although your
| simple solution is well, simple :)
|
| - I feel like the benchmark suite is lacking. For better overview
| you probably should include libpng results with max compression
| level and lowest compression level. Lossless modes of AVIF and
| WEBP would be nice. (also could throw similar project to yours
| like https://github.com/catid/Zpng) Not saying the benchmark is
| bad, but IMHO doesn't paint the full picture. From quick test I
| got significantly better compression on libpng, ofc in expense of
| time, but you didn't configure libpng for speed either. So we
| have some results, but they are not really representative imho.
|
| - I understand it is pet project, but don't forget about
| importance of image metadata, like colorspace and so on.
|
| - Also keep in mind that compression is easy at the beginning,
| but the tricky part is how to squeeze more.
|
| EDIT: formatting...
| jiri wrote:
| This code is nice for RGB, but what about YUV420 format? There
| should be even more efficient way to store image in YUV420 (takes
| half bytes than RGB 888), and most images (JPEGs, direct cameras
| output in Android etc) are already in YUV420. I wonder if auther
| has tried YUV420 already and failed for any reason ...
| haxiomic wrote:
| This is impressive, I was curious if it was just better on
| simpler images with many duplicate pixels but it's nice to see it
| working well with photography too
|
| e.g.
| https://phoboslab.org/files/qoibench/images/wallpaper/EwZDbL...
| decode ms encode ms decode mpps encode mpps size kb
| libpng: 148.4 3995.5 55.88 2.08
| 12223 stbi: 161.0 1858.3 51.50
| 4.46 19199 qoi: 60.8 95.6
| 136.49 86.78 12868
|
| I'm interested if there's a standardized benchmark used in the
| academic literature this could be tested against
|
| In any case the results are impressive enough that this will 100%
| be used in projects I work on!
|
| Many thanks to the author for their work <3
| nightcracker wrote:
| When I download that PNG file it is 10785 KB. If I re-encode it
| using FLIF 0.4 (to my knowledge still the best off-the-shelf
| lossless image compression) it is 7368 KB. That's 57% of the
| size of qoi, quite a lot smaller. You pay for it in
| encode/decode time though.
| fastball wrote:
| Interestingly I actually managed to shave 400kb off of that
| PNG by just running it through some PNG optimizers. Not great
| savings but were still some to be had!
| minus7 wrote:
| FLIF has been superseded[1] by JPEG XL, which, in my very
| limited testing, performed better (speed and compression-
| wise; libjxl) than FLIF.
|
| Interesting though that FLIF 0.4 was released 3 days ago. I
| haven't checked out that new release (and won't), but the
| previous one wasn't great code and was a pain to use (ended
| up using its CLI instead of the lib). We ended up going back
| to PNG because of interoperability and speed. I haven't
| looked at libjxl's API yet.
|
| Edit: 8605KB in 11.5s with libjxl, so it's not actually
| better than FLIF 0.4 (7368KB in 23.9s)
|
| [1]: https://flif.info/#update
| nightcracker wrote:
| Yes, I am aware it has been 'superseded' but in my
| experience for lossless compression it's still usually
| better.
| gfody wrote:
| pngcrush -brute 10355kb cjxl -q 100 -e 9 8413kb
| haxiomic wrote:
| Can be made smaller for sure, but if I'm say working with
| 10,000 images in my app and decoding all the time, a 3-4x
| speedup in the decode time can be a major saving for ~ 10-40%
| increase in file size with a lib I can just drop in my
| project without effort
| mattashii wrote:
| Nice stuff! But I was confused by the following:
| QOI_RUN8 { u8 tag : 3; // b010 ...
| QOI_RUN16 { u8 tag : 3; // b011
|
| I think there might be a bug lurking in those tag values.
|
| EDIT: Wow, I'm blind. Those 3s were bits consumed by the tag, not
| tag values...
| formerly_proven wrote:
| The :3 means how wide the field is. Instructions are byte-
| aligned, but bit-packed.
| lam0x86 wrote:
| Looks like PCX on steroids
| j-pb wrote:
| > To keep things O(n) when encoding, there's only one lookup into
| this array. The lookup position is determined by a "hash" of the
| rgba value (really just r^g^b^a).
|
| Is a bit imprecise. The algorithm would still be O(n) even with a
| linear search through the "seen pixel array", as it is bounded by
| 64 length and therefore a constant factor that gets eaten by the
| Big-O notation.
|
| Because of this hash the algorythm seems to do something else
| than first described though. Instead of "a running array of the
| 64 pixels it previously encountered", it's actually the previous
| 64 pixel _values_ previously encountered, which allows for much
| bigger jumpback.
|
| By deterministically computing this array during execution it
| itself seems to act as a kind of compressed jumpback table,
| allowing for approximated arbitrary jumpback with just 6 bit.
|
| Quite impressive!
|
| Edit: I think there might be an interesting tweak that could
| increase the effectiveness of the hashing technique used.
|
| If one were to walk and store the pixels in a Z-Curve order, the
| runtime would stay the same but the locality of the color value
| pool, might be increased.
| tda wrote:
| I was about to post the same idea about a locality preserving
| pixel order. I was thinking about hilbert curves, never heard
| of z-curves. Would be nice to see if/how that affects the
| compression ratios. How surprising that a few simple principles
| can give quite a good image format
| mkl wrote:
| https://en.wikipedia.org/wiki/Z-order_curve
|
| Related to this method, it's sometimes used for texture
| mapping on GPUs.
| olaulaja wrote:
| Based on some quick* testing using z-order instead of going
| row by row ends up taking 1-10% less space. Most images were
| on the lower end but I didn't run into any taking more space.
|
| * Swapping to z-order requires about 5 lines of changes if
| you ignore all the little edge cases like non power-of-two
| image sizes. Images used in testing may or may not be
| representative of anything.
| j-pb wrote:
| After making the edit, I saw that someone made the same
| suggestion on twitter!
|
| Hilbert curves have slightly better locality than Z-curves,
| but Z-Curves are trivial to produce.
|
| If you have a N dimensional space over unsigned integers,
| then you can place each point on a one dimensional Z-Curve by
| interleaving their bits into a new integer.
|
| So for any point (X, Y) in the X-Y coordinates of an image
| the Z-Curve coordinate is X_1,Y_1,X_2,Y_2...
|
| Which can be achieved with a bunch of bit shifts and masks,
| (or a bunch of wires if you happen to use it in an FPGA ^^')
| formerly_proven wrote:
| CPUs can do this quickly: __m128i src =
| _mm_set_epi32(0, 0, x, y); __m128i res =
| _mm_clmulepi64_si128(src, src, 0); uint64_t zeorder
| = _mm_extract_epi32(res, 0) | (_mm_extract_epi32(res, 2) <<
| 1);
|
| Though a blocked iteration order should perform better due
| to better locality.
| j-pb wrote:
| Thanks for the pointer!
|
| If anybody else want to know how this works:
|
| https://stackoverflow.com/questions/30539347/2d-morton-
| code-...
|
| https://en.wikipedia.org/wiki/Carry-less_product
| user-the-name wrote:
| Hilbert curves are not _that_ much more complex. Maybe ten
| or so lines of code. Interleaving bits is already a pretty
| complicated operations if you don 't have builtins for it.
| a_e_k wrote:
| I agree that de-interleaving is fairly complicated, but
| interleaving can be done fairly easily with a lookup from
| a table for each dimension and then combining them:
| xbits[x] | ybits[y] | channelbits[c].
|
| The nice thing about this is that just by changing the
| tables, you can get things like z-curve, tiles, row-
| major, or column-major pixel orders (or even
| combinations), plus swizzled, interleaved, or planar
| layouts for the color channels.
| mzs wrote:
| Do you have a link to discussions about QOI on twitter?
| j-pb wrote:
| https://twitter.com/WAHa_06x36/status/1463456861978599426
| ?s=...
| phkahler wrote:
| I added Z-order curve to a ray tracer many years ago.
| Rather than using nested loops over X and Y pixel
| coordinates I used a single loop and de-interleaved the
| count to get X and Y. My thought was that this would
| increase the cache hit rate due to the locality of the
| pixels. It worked and the speedup was on the order of 10
| percent. Because images are rarely power of 2 or even the
| same in both dimensions I opted to make a list of 32x32
| blocks and rendered the blocks in Z-order as well as the
| pixels within (which still helped vs just tiling).
|
| Locality is a good thing to exploit.
| ErikCorry wrote:
| I wonder how much worse the compression would be if you just
| had the last 64 colors, but didn't update when using the run
| length byte codes. Seems like it would be close to the same,
| and the decoder wouldn't need to hash the pixels.
| sltkr wrote:
| > Because of this hash the algorythm seems to do something else
| than first described though. Instead of "a running array of the
| 64 pixels it previously encountered", it's actually the
| previous 64 pixel _values_ previously encountered, which allows
| for much bigger jumpback.
|
| It's somewhere in between. Because these values are hashed into
| single slots, assuming that hash codes are random, the
| probability that a value is still in the table after k
| intervening values is pow(63/64, k).
|
| That means the table is unlikely to contain all 64 values, but
| it will also retain some older values. Since older values are
| less likely to be useful in the future the encoding performance
| is likely somewhat worse than if the algorithm used strictly
| the last 64 distinct values.
|
| This means that the algorithm is a bit sensitive to hash
| collisions, which seem particularly likely for primary colors
| (i.e. with each of the components at value 0 or 255):
| - Slot 0: black, ALL SHADES OF magenta (x ^ 0 ^ x = 0), yellow
| (etc.) and cyan. - Slot 15: white (255 ^ 255 ^ 255 =
| 255), red (255 ^ 0 ^ 0 = 255), green (etc.), blue.
|
| Except black and white, these are unlikely to occur often in
| photographic images but could well occur in bitmaps with a
| handcrafted palette, or when the RGB values are converted from
| a system with a lower color space. For example, the 16 color
| CGA palette maps to just 3 different slots.
| j-pb wrote:
| Yeah, one should probably add a randomisations step for each
| channel for good measure.
|
| With random byte->byte lookup tables those costs should be
| hidden by the cache.
|
| so hash = HR[r] ^ HG[g] ^ HB[b] ^ HA[a].
| ball_of_lint wrote:
| In an image with very limited colors you would have long runs
| of the same color and it would compress well from just the
| run-length encoding alone.
|
| Also, it's non-obvious that older values will be less useful.
| Going through the pixels in order like this doesn't preserve
| locality well, so older values may actually be closer to a
| given pixel than more recent ones.
| MauranKilom wrote:
| > In an image with very limited colors you would have long
| runs of the same color and it would compress well from just
| the run-length encoding alone.
|
| ...unless those images are dithered. Which may be a
| relatively common thing to do in small color spaces!
| kortex wrote:
| Super cool work! I was also mucking around with codecs last week.
| I got some surprisingly good results from delta encoding each
| channel separately and then zstandard compression. Dirt simple,
| crazy fast.
|
| One trap to be aware of when profiling image codecs is not
| knowing the provenance of the test images. Many images will have
| gone through one or more quantization steps. This can lead to
| impressive compression ratios, but if you compare with the
| results on raw, unquantized imagery, it's less of a slam dunk.
|
| That said, PNG is a _really_ low bar to clear, both in terms of
| speed and compression ratio. Even really image-naive compression
| algorithms tend to be within a factor of 2 in resulting
| compressed size. You tend to get diminishing returns. 10% vs 20%
| is twice as efficient, but starting with a 10MB file, 1 vs 2 MB
| file sizes isn 't a huge delta. (yes I know there are domains
| where this matters, don't @ me).
| simcop2387 wrote:
| > Super cool work! I was also mucking around with codecs last
| week. I got some surprisingly good results from delta encoding
| each channel separately and then zstandard compression. Dirt
| simple, crazy fast.
|
| My understanding is that this is one of the techniques that png
| uses (though not zstd) for doing it's xompression too. Along
| with I think some different strides down the image and some
| other stuff but I don't fully understand everything it tries.
| That's what optipng and friends play with for parameters to
| find the best settings for decomposing the image for
| compression.
|
| If you look at what jpeg and similar codecs do they nearly take
| a DC offset out of the image because that difference then
| greatly reduces the range of values you have to encode which
| means that you need fewer bits to do so. Combine that with
| range or arithmatic coding and you can get decent efficency for
| very little work. Then with lzw, zstd, etc. You can get lots of
| other patterns taken out and compressed. Lossy codecs try to
| take out the stuff we won't see with our eyes so that the set
| of values to be encoded need fewer bits and it'll also make
| more patterns appear that can be exploited too
| kortex wrote:
| Yeah, my technique was equivalent to filter mode 0 (the only
| mode), filter type 1 (sub pixel to the left). literally
| array.ravel().diff(prepend=0) in numpy.
|
| I think part of the reason png is so slow is that it
| empirically tests to find the best one.
|
| > Compression is further improved by choosing filter types
| adaptively on a line-by-line basis.
|
| I'd be curious to unpack a png and see the statistics of how
| much each kind is used.
| janandonly wrote:
| Waawh, one would think (certainly, I personally thought) that all
| the _simple_ compression algorithms would have been discovered
| and build by now...
| rowanG077 wrote:
| Interesting I have the opposite view. From intuition I would
| think there is an unbounded set of simple compression
| algorithms. You can probably make much more effective simple
| compression algorithms by making basic assumptions about the
| type of image.
| EmilioMartinez wrote:
| This is patently so. Taken to absurd proportions, you can
| compress a specific rendition of the Mona Lisa with just one
| bit. 1 means Mona, and 0 is undefined behaviour.
| mijoharas wrote:
| Are we about to talk about Kolmolgorov complexity? you're
| damn right that we're about to talk about Kolmolgorov
| complexity! :)
|
| [0] https://en.wikipedia.org/wiki/Kolmogorov_complexity
| FooBarBizBazz wrote:
| Taking this a step further: 0 means that all subsequent
| bits contain an arbitrary PNG. Now your algorithm encodes
| _any_ image, and still does Mona Lisa in one bit.
| selcuka wrote:
| It's also almost as effective as PNG (plus one bit) for
| non-Mona Lisa images.
| vintermann wrote:
| Well, this is a combination of simple approaches which have all
| been used before, though put together in a nice way. And it's
| not benchmarked against modern formats like lossless webp or
| jpeg-xl.
| lifthrasiir wrote:
| tl;dr: 00xxxxxx - copy (x+1)-th last
| EXPLICITLY ENCODED pixel (i.e. ignoring repeats) 010xxxxx
| - repeat the last pixel x+1 times 011xxxxx xxxxxxxx -
| repeat the last pixel x+33 times 10rrggbb - copy
| the last pixel and adjust RGB by (r-1, g-1, b-1) 110rrrrr
| ggggbbbb - copy the last pixel and adjust RGB by (r-15, g-7, b-7)
| 1110rrrr rgggggbb bbbaaaaa - copy the
| last pixel and adjust RGBA by (r-15, g-15, b-15, a-15)
| 1111RGBA [rrrrrrrr] [gggggggg] [bbbbbbbb] [aaaaaaaa]
| - copy the last pixel and replace RGBA if each bit flag is set
|
| So it is essentially the PNG filter limited by Sub, but offers
| better delta coding and a larger history window. My guess is that
| it might not work well if the image has a strong vertical
| gradient (which would need the vertical context), but
| nevertheless it's pretty impressive that this simple coding is
| not as inefficient at all.
| bonzini wrote:
| > 00xxxxxx - copy (x+1)-th last EXPLICITLY ENCODED pixel (i.e.
| ignoring repeats)-
|
| Not exactly. It's "copy the last pixel color for which
| (r^g^b^a)&63 == x".
| lifthrasiir wrote:
| Oh you are right, I only took a cursory look at the header
| file and thought the hashing is only used to speed up the
| lookup.
| miohtama wrote:
| This is so neat. Reminds me the recent LZ4 post who manually
| hand tweaking the encoding one could still create a decent
| compression boosts after all these decades.
| userbinator wrote:
| I see it as a variant of LZ specialised to images, so its
| performance should not be too surprising. Running an
| uncompressed bitmap through a general-purpose compression
| algorithm tends to yield similar results to PNG.
| chrisdew wrote:
| The current encoding is complete, so no future extensions are
| possible.
|
| A fix would be to change: 110rrrrr ggggbbbb -
| copy the last pixel and adjust RGB by (r-15, g-7, b-7)
|
| to: 1100rrrr ggggbbbb - copy the last pixel and
| adjust RGB by (r-7, g-7, b-7)
|
| This leaves: 1101???? ...
|
| available for future extensions.
| HanClinto wrote:
| I really like this.
| mkl wrote:
| You might be able to do better compression by traversing the
| pixels in Hilbert [1] order or Z [2] order, to take into
| account vertical neighbours as well as horizontal. It might
| cost time or space though, as you couldn't stream the pixels
| through it anymore - you'd need them all in memory.
|
| [1] https://en.wikipedia.org/wiki/Hilbert_curve
|
| [2] https://en.wikipedia.org/wiki/Z-order_curve
| EdSchouten wrote:
| The nice thing is if M >= B^2 (i.e., total memory is large
| enough to fit a square region of the image, where each
| row/column of the square fits a full block/page of memory),
| you can transform from row/column order to Hilbert/Z-order
| without needing to do more I/Os.
|
| So you can't do such a conversion in a streaming fashion, but
| there is no need to load all data in memory either.
| andai wrote:
| See also: Intel's guide to looping over smaller chunks of
| 2D arrays for better cache utilization:
|
| https://www.intel.com/content/www/us/en/developer/articles/
| t...
| EdSchouten wrote:
| Yeah, those images explain it pretty well. There is one
| slight difference, though.
|
| Because the Hilbert/Z-order curves are defined
| recursively, algorithms for converting between the curve
| and row/column order are "cache-oblivious", in that they
| don't need to take an explicit block size. You write it
| once, and it performs well on any system regardless the
| CPU cache size and/or page size.
| andai wrote:
| I wonder if the pixel data were transformed to/from a
| different order, before and after the
| compression/decompression, if that would speed things up
| without introducing too much slowdown of its own?
| nightcracker wrote:
| You could still stream the pixels, you'd just have to stream
| them in Z order. This may seem pedantic, but this applies to
| any mismatch between the order in which you store pixels and
| the order in which the format stores pixels.
|
| E.g. some file formats may be row-major, others column-major,
| some block-based (like JPEG) or a fancy interleaving scheme
| (like Adam7). Some might store the channels interleaved,
| others separate the channels. If any of these choices doesn't
| match the desired output format, it breaks streaming.
| akx wrote:
| And formats like BMP are stored upside down...
| adrianN wrote:
| Iteration order through arrays has a big impact on
| performance due to cache effects. I'd guess that you'd lose a
| lot of performance by a complicated iteration scheme.
| [deleted]
| sbrki wrote:
| Very nice.
|
| Regarding the comments about complexity of existing formats - I
| agree, with the exception of PNG. In terms of complexity, QOI is
| very similar to PNG filtering phases, i.e. when you strip primary
| compression phase (Huffman coding) from PNG.
|
| Also, I admire how the author combined a lot of common and cool
| tricks into QOI:
|
| - Block tags form a sort of a prefix code, making it easy to
| differentiate block types with differing sizes
|
| - Look-back array is used (common trick from LZW family)
|
| - Delta-coding is used when storing pixel diffs
|
| - In the end, if nothing else works, full pixel value is written
|
| Also, the fact that everything is byte aligned is very
| appreciated.
|
| This is a breath of fresh air and really impressive how good
| performance (both processing and size reduction) this gets with
| so little (very readable!) C.
| superjan wrote:
| Traditional compression research usually prioritized
| compression ratio, so more images would fit on your floppy
| disks and were faster to download over 28.8k modems. It usually
| means increased complexity.
|
| Png has it's own story as it's #1 design goal was to avoid
| techniques that were patented at the time. So they combined a
| simple set of prepasses with zip/gzip compression. There were
| better techniques known at the time, but the idea was that
| combining simple, and known patent-free solutions was the
| safest way to achieve a free format.
| BlueTemplar wrote:
| The most important is probably the difference between lossy
| and lossless compression...
|
| (The GP-mentioned Bink is lossy ! Then he also for some
| reason uses a picture tending towards photo realistic which
| does NOT play well with lossless !)
| superjan wrote:
| Yeah there's a difference in techniques used for lossy vs
| lossless. But my point was that in compression research,
| runtime performance was much less important than
| compression performance, that was no different for lossy
| compression.
| jijji wrote:
| speaking of runtime performance if you look at lzma for
| instance it's horrible on runtime performance but it
| produces some of the best compression ratio of all of
| them
| khoobid_shoma wrote:
| Yeah, really nice and efficient C programming with a great
| idea!
| KaiserPro wrote:
| > The basic ideas for video compression in MPEG are ingenious,
| even more so for 1993, but the resulting file format is an
| abomination.
|
| can confirm.
|
| Although the worst part is the documentation is in engineering
| english, so its quite hard to understand. MPEG2 is actually not
| as bad as you'd imagine. MPEG4 is.
| pantulis wrote:
| This is as cool as it gets.
|
| I have a honest question about the source code, a lot of time
| since I coded anything meaningful in C, but I remember the header
| files did not have that much code on it, while here I see most of
| the code is in the header file. Why is this the case? Inline
| compilation? What are the advantages?
| CornCobs wrote:
| Single header libraries are hugely more ergonomic and easy to
| use. Just get the file, #include it and you're done! Most C
| libraries on GitHub prefer this mode of distribution now
| phkahler wrote:
| >> Single header libraries are hugely more ergonomic and easy
| to use. Just get the file, #include it and you're done! Most
| C libraries on GitHub prefer this mode of distribution now
|
| I find that odd. How is that significantly better than
| dropping _2_ files into a project and #including the header?
| A good build system should just compile all the C or C++
| files in a given place, so nothing needs to be changed other
| than dropping the files in and #include where called from.
| hn_acc_2 wrote:
| The difference will be painfully clear once you try to walk
| a junior developer through installing and linking new files
| to a large project in Visual Studio, vs "copy this .h file
| here and include it".
|
| Whether or not a "good build system" should handle it, the
| fact that single-file libraries are much preferred these
| days should demonstrate most people don't have such a build
| system
| rytill wrote:
| Why use two files instead of one?
| bluedino wrote:
| What if you want to use the functions in other C files?
| What if you want to compile them all separately?
| hn_acc_2 wrote:
| 1) Include it wherever it's used. The convention is to
| use the preprocessor[1] so it only appears once in the
| final binary
|
| 2) Most developers just want to compile to a single
| binary. Any developer who for some reason needs
| separately compiled objects should be able to quickly
| achieve that from a single-file library
|
| [1] https://github.com/phoboslab/qoi/blob/324c2243b2fb35f
| 741a88d...
| clownpenis_fart wrote:
| a lot of people seem allergic to using the linker these days.
|
| As a result you have to inline all their code, which is usually
| ridden with warnings so you can't compile anymore with sane
| options like -Werror.
|
| It's not like using an actual library is really hard in the age
| of cmake and meson.
| lifthrasiir wrote:
| There is a strong tradition of single-file C/C++ libraries [1]
| because it has been difficult to integrate larger C/C++
| libraries into the codebase.
|
| [1] https://github.com/p-ranav/awesome-hpp
| flohofwoe wrote:
| There's an important difference between STB-style 'single
| file libraries', and C++ 'stdlib-style' header-only-
| libraries: STB-style libraries put the implementation code
| into a separate section inside an "#ifdef IMPLEMENTATION"
| block, while most C++ header-only-libraries use inline code
| (and thus pay a higher compilation cost each time this header
| is included).
|
| STB-style libraries skip the implementation already in the
| preprocessor and compile the implementation only once for the
| whole project.
| sltkr wrote:
| So what's the compelling reason to ship these as a single
| source file, instead of using the more logical and standard
| solution of a single header and a single source file?
|
| Unlike C++ header-only libraries, C libraries like this
| still need their implementation included exactly once.
|
| I understand the benefits of keeping the implementation in
| a single source file (simplifies integrating in existing
| build systems, interprocedural optimizations, etc.) but
| combining the source and header files too makes them
| awkward to use, with little benefit beyond being able to
| claim "our implementation is only 1 file!" which is about
| as impressive as saying "our implementation is only 1
| line!" if you've just deleted all the whitespace.
| lmm wrote:
| > So what's the compelling reason to ship these as a
| single source file, instead of using the more logical and
| standard solution of a single header and a single source
| file?
|
| C doesn't have a packaging format or dependency manager,
| so having to keep track of two files would be pretty
| cumbersome.
| flohofwoe wrote:
| In general, STB-style headers simplify C/C++ build system
| shenanigans.
|
| E.g. one benefit is that you can configure the
| implementation via preprocessor defines without the
| defines leaking into the build system. Just add the
| config defines in front of the implementation include.
|
| It's also trivial to write "extension headers" which need
| to directly peek into the (otherwise private)
| implementation of the base library. Just include the
| extension header implementation after the base library
| implementation.
|
| PS: but also see: https://github.com/nothings/stb#why-
| single-file-headers
| sltkr wrote:
| > In general, STB-style headers simplify C/C++ build
| system shenanigans.
|
| Too vague. How is it better, concretely?
|
| > E.g. one benefit is that you can configure the
| implementation via preprocessor defines without the
| defines leaking into the build system
|
| That's not a benefit. How is this:
| #define LIB_IMPLEMENTATION #define LIB_KNOB 123
| #include "lib.h"
|
| Any better than the two-file version:
| #define LIB_KNOB 123 #include "lib.c"
|
| ?
|
| > It's also trivial to write "extension headers" which
| need to directly peek into the (otherwise private)
| implementation of the base library. Just include the
| extension header implementation after the base library
| implementation.
|
| I'm not sure what this means. Can you give an example? Is
| this being used in the above library?
|
| > PS: but also see: https://github.com/nothings/stb#why-
| single-file-headers
|
| OK, that only offers this explanation:
|
| > Why not two files, one a header and one an
| implementation? The difference between 10 files and 9
| files is not a big deal, but the difference between 2
| files and 1 file is a big deal. You don't need to zip or
| tar the files up, you don't have to remember to attach
| two files, etc.
|
| This is kind of nonsense to me. The difference between 2
| files and 1 file is barely relevant, and doesn't weigh
| against the awkwardness of having no clear distinction
| between declarations and definitions, and having to pull
| in definitions by setting a magic preprocessor header.
| flohofwoe wrote:
| > Any better than the two-file version: >
| #define LIB_KNOB 123 > #include "lib.c"
|
| See, now you have 3 files, the .h/.c file pair of the
| library, and your own implementation source file with the
| configuration which includes a .c file. How is this
| better than having two files?
|
| > Too vague. How is it better, concretely?
|
| I can only assume from that question that you haven't had
| the "pleasure" to work much with C/C++ build systems yet,
| at least when it comes to integrating 3rd party
| libraries.
|
| Re. extension headers: #define
| IMPLEMENTATION #include "base_library.h"
| #include "extension_library.h"
|
| Since the implementation of the base library is in the
| same compilation unit as the extension library, the
| extension library has direct access to the private
| implementation details of the base library without having
| to add a separate 'private but actually public API'.
|
| > no clear distinction between declarations and
| definitions
|
| STB-style libraries have this clear distinction, C++
| style header-only libraries don't.
| sltkr wrote:
| > See, now you have 3 files, the .h/.c file pair of the
| library, and your own implementation source file with the
| configuration which includes a .c file.
|
| The third source file is what the library authors
| suggested; I'd just create a seperate Makefile target for
| the implementation, which is then linked into the binary.
| (Yes, that's an extra target, but that costs very little
| and allows you to avoid unnecessary rebuilds.)
|
| Why don't you give me a concrete counterexample where the
| single-header-setup makes things _simpler_? If it 's so
| obviously beneficial it shouldn't be so hard to come up
| with a concrete example.
|
| > #define IMPLEMENTATION > #include "base_library.h" >
| #include "extension_library.h"
|
| Thanks for explaining, but that doesn't really motivate
| having a single file, since you could just as easily do:
| #include "base_library.c" #include
| "extension_library.c"
|
| And you've saved a line (and removed the need for obscure
| preprocessor macros).
|
| > ... without having to add a separate 'private but
| actually public API'.
|
| Personally, I'd say this is exactly the right way to
| model this, but even if you disagree: it sounds like this
| simplifies things mainly for the library/extension
| authors, and not the library users, because if the
| extension author forward-declares the implementation
| details they depend on, there is no additional file to
| manage for the users either.
| wizzwizz4 wrote:
| > _The difference between 2 files and 1 file is barely
| relevant,_
|
| To you. To me doing Rust stuff, it's been quite helpful
| to have single-file C programs; it makes it a lot easier
| to compile and link things.
| ur-whale wrote:
| Packing the entire library in a header file is a somewhat
| recent and much welcome trend.
|
| See for example:
|
| https://github.com/nothings/stb
|
| https://github.com/nothings/single_file_libs
| mkl wrote:
| I wouldn't call it recent. Boost has been doing it since it
| was started in the late 1990s.
| ChrisMarshallNY wrote:
| The original C++ templates used to be entirely contained in
| the headers. I assume that is no longer the case, but it
| has been a long time, since I wrote C++.
|
| When I was writing template SDKs, the headers were pretty
| much the entirety of the library.
| flohofwoe wrote:
| Boost or C++ stdlib headers are different than STB-style
| single-file-libraries though in that the implementation
| code is inline and needs to be parsed each time the header
| is included.
| tommiegannert wrote:
| Boost did it because C++ template definitions must be in
| headers.
|
| That seems unrelated to this idea that C code using it is
| equally reasonable. That's a developer choice.
| willvarfar wrote:
| Yummy :)
|
| Makes me want to experiment too.
|
| Is there some easy corpus with nice categorization of types
| (drawing, painting, photo, comic, clip art etc) so we can all
| easily compare?
| stillicidious wrote:
| > I want to preface this article by saying that I have no idea
| what I'm doing
|
| Instant hire!
| dbrgn wrote:
| Looks nice!
|
| Did you try fuzzing your implementation (e.g. with AFL++ or
| libfuzzer)? Codecs and especially decoders written in C that
| handle untrusted binary data are quite worthwhile targets for
| fuzzing.
| SavantIdiot wrote:
| What I like about this approach is that it requires zero
| differential calculus. People forgot that Radix2 DCTs come from a
| century's worth of advanced math in the S domain. This approach
| is applied-computer-science based, I can get my head around that
| a lot better than a Laplace/Z transform.
|
| But what shocks me is how did absolutely no-one apply such a
| naive RLE in the 30 years of internet image formats?
|
| Is this a case of "man reinvents wheel again because why do
| research" or is this exceptional?
| a_e_k wrote:
| > But what shocks me is how did absolutely no-one apply such a
| naive RLE in the 30 years of internet image formats?
|
| Off the top of my head, both PCX [0] and TGA [1] use similarly
| naive RLE as their compression method.
|
| And while they also offer fancier compression methods, BMP [2],
| TIFF [3], and OpenEXR [4] all offer RLE modes as well.
|
| [0] https://en.wikipedia.org/wiki/PCX
|
| [1] https://en.wikipedia.org/wiki/Truevision_TGA
|
| [2] https://en.wikipedia.org/wiki/BMP_file_format
|
| [3] https://en.wikipedia.org/wiki/TIFF
|
| [4] https://en.wikipedia.org/wiki/OpenEXR
| Trixter wrote:
| RLE on runs of deltas is in very many compression schemes; it's
| a common technique (probably so common that many authors don't
| deem it worth mentioning). Lossless video codecs in particular
| (huffyuv, lagarith, utcodec, etc.) have had this going back 20
| years at least.
| wongarsu wrote:
| Is it just me, or is the compression algorithm not really less
| than the one in PNG? I will grant the author that if you don't
| need features like support for color spaces, different bit depths
| or interlacing you can skip the entire container format. But for
| reference this is the core part of png:
|
| For each scanline write one byte for the chosen filter method,
| then $width pixels. There are five filter types. In type 0, write
| the pixels unmodified. In type 1, write the difference to the
| previous pixel. In type 2, write the difference to the above
| pixel. In type 3, write the difference to the average of the
| above and left pixel. In type 4, write the difference to the
| Paeth predictor (a simple function calculated from the above,
| left and left above pixel). Then compress everything with
| deflate.
|
| Obviously by modern standards that isn't performant, not least
| because deflate sucks by modern standards. But it's simple, can
| be implemented in a couple lines of C (if you pull in deflate as
| dependency), and if you don't do exhaustive searches for the best
| compression it can certainly run in O(n)
| gpvos wrote:
| The 64-pixel lookback makes it a bit similar to LZ-style
| compression like Deflate, but the latter is not O(n) because it
| can recognize multi-pixel sequences. QOI is just a slightly
| beefed up run-length encoding.
| ErikCorry wrote:
| I like the small size of the decoder state - not much more than
| 256 bytes. I haven't found a way to decode GIF with less than
| about 16k of mutable state. PNG is similar though you can use the
| decoded image if it's available.
| benlivengood wrote:
| I wonder if the author tried making pixel diffs capable of using
| lookback. They'd be one byte longer but would possibly happen a
| lot more often than full pixel values. It's also reminiscent of
| the delta blocks from MPEG. If anything I would trade the 5-bit
| run for delta-from-lookup on a whim (the cases where the ratio
| between short runs vs. exact lookback is extreme seems like it
| might only happen for pinstripe patterns). I should probably
| implement the change and test it before posting...
| adgjlsfhk1 wrote:
| The hard part here is figuring out which pixel to diff to. You
| would almost certainly have to look at all 64 pixels which
| would be a lot slower. That said, you can probably get
| noticably better compression from it.
| benlivengood wrote:
| I think O(log(M)) where M is the number of remembered pixels.
| Index them by their (R,G,B,A) quad and do a range lookup
| sequentially by pixel, since any valid pixels would be within
| the same range as earlier potential matches.
| SkeuomorphicBee wrote:
| Really cool encoding, it is a bit incredible that such simple
| rules can archive such good results.
|
| It is so interesting that if the image have few colors, method #2
| alone (An index into a previously seen pixel) encodes the whole
| image into an indexed colour space (with the hash table becoming
| the pallet).
|
| Ive seen already some comments recommending adding some vertical
| locality by tiling, Hilbert order, or Z order. I have yet another
| different vertical locality suggestion to try:
|
| While encoding/decoding keep a ring buffer the size of one full
| line of the image, storing each encoded/decoded bit there. Then
| add a variation of method #3 (The difference to the previous
| pixel) which instead encodes the difference to the pixel above
| the current (the last pixel from the ring buffer). Maybe, due to
| the limited "tag" space it would not be worth it to implement it
| for DIFF8, but only for DIFF16 (taking one bit from red to the
| additional tag bit) and DIFF24 (I'm not sure which bit I would
| steal there).
|
| Edit: Also, when thinking about video encoding in the future you
| may want to keep a ring buffer fitting the full last frame of
| video. And then temporal locality is probably even more
| important, so then you may want to change the tags to encode more
| commands relating to the pixel from the previous frame.
| tda wrote:
| nice idea, I think it is somewhat more in line with the keep it
| simple option, as I e.g. have no idea what a hilbert curve for
| an arbitrary size rectangle looks like. And you still get
| compression for x and y directions (though no run length
| encoding the vertical, perhaps a further addition).
|
| I quite like this approach of defining a few compression
| methods, see which one works best, and apply that one. I used a
| similar strategy for time series compression where I would take
| a chunk of data, apply z few different compression algorithms
| on it (RLE, delta RLE, etc) and just keep which ever one was
| smallest. Of course here it is done on a per pixels basis, and
| I used arbitrary size chunks. The latter is trivial to
| parallelize. Same could be applied here though, split the image
| in half and compress both parts independently
| gfody wrote:
| having done my own experiments with indexed pallets I suspect
| method #2 is responsible for most of the compression here. most
| photographs only have half a million distinct colors and it
| drops considerably as you focus in on an area. if you're
| willing to relax the single-pass requirement, partitioning the
| whole image by pallet similarly to how one encodes high-color
| images into multi-framed gifs can yield surprisingly good
| compression.
| a_e_k wrote:
| QOI_DIFF16 { u8 tag : 3; // b110 u8
| dr : 5; // 5-bit red channel difference: -15..16
| u8 dg : 4; // 4-bit green channel difference: -7.. 8
| u8 db : 4; // 4-bit blue channel difference: -7.. 8
| }
|
| I'd think that the extra bit ought to be given to green since
| that tracks most closely with luminance. That would make this
| somewhat similar to the RGB565 layout for 16-bit pixels
| (https://en.wikipedia.org/wiki/High_color#16-bit_high_color).
| Ono-Sendai wrote:
| Very cool. Typo: "I barely understand how Huffman Coding and DTC
| works" I guess you mean DCT? (discrete cosine transform)
|
| One other thought: To make it so that compression and
| decompression can be multithreaded, you might want to 'reset' the
| stream every N rows. (i.e. break any RLE runs, start from a
| colour literal). This would allow a bunch of threads to start at
| different places in the image, in parallel. There would be some
| small cost to compression ratio but might be worth it.
| phoboslab wrote:
| Thanks, fixed!
|
| I'll probably investigate "resetting" the stream to allow for
| multithreaded en-/decode when I try to roll this into a video
| codec.
| mkl wrote:
| Even better might be breaking it into tiles, so you get some
| vertical locality happening as well as horizontal.
|
| It would be interesting to know which strategy is used for
| each pixel. Have you tried making maps of that with four
| different colours? Particularly interesting would be how much
| the cache is used and also how often it replaces a colour
| that it could have used later. Maybe there's a better colour
| hash. BTW you might want to change "(really just r^g^b^a)" to
| "(really just (r^g^b^a)%64)".
| iainmerrick wrote:
| I really like the trick of caching previously-seen pixels
| by value rather than position. Very nifty!
|
| I was thinking this would combine well with progressive
| rendering (i.e. store pixels in mipmap-style order) so the
| cache is warmed up with spread of pixels across the image
| rather than just scanline order. That doesn't make it
| parallelizable in the way tiling does, though, hmm.
|
| Another tweak that would probably help is having the file
| specify a rotation (or even a full mapping table) for each
| component of the hash, so a clever encoder can pick values
| that minimise collisions. (A fast encoder can just use all-
| zeroes to get the same behaviour as before.)
| Trixter wrote:
| Before you start rolling this into a video codec, please
| consult the sources of UT Codec, which has been state of the
| art in this space for many years. Or, at least, use UT Codec
| as a direct basis for comparison, to see if you are beating
| it in terms of compression/decompression speed.
|
| Modern lossless video codecs don't care that much about
| saving space, since you're burning gigabytes per minute
| anyway; the key is that the compression and decompression are
| as transparent as possible, to reduce the bandwidth to/from
| the storage medium. A good lossless codec can be used to
| scrub through and edit on a video editor's timeline, so
| decomp performance is what's most important.
|
| Also, any such lossless video codec is going to need more
| than 8 bits per component; most real-world footage coming off
| of phones and cameras is 10-bit Rec.2020. If it is too
| difficult to position QOI for real-world sources, you can
| certainly keep it 8-bit and market it as an animation codec
| or similar; just keep it in mind.
| londons_explore wrote:
| This is a great project, but personally I feel that there are
| lots of uses for uncompressed images (ie. random access, ready to
| render straight away), and lots of uses for well compressed
| lossless, but little use for 'slightly compressed' images.
|
| The normal argument for 'slightly compressed' images is that they
| can be decompressed with very few CPU cycles, but the reality is
| that with modern platforms lossless image en/decoding is done in
| hardware and therefore pretty much 'free', as long as you pick
| lossless h264, lossless webp or lossless hevc. All of those
| things will also give much smaller file sizes.
|
| In the few places where hardware decode isn't a thing (eg. the
| memory-constrained 8 bit microcontroller decoding the boot screen
| for your IoT toaster), there usually isn't a need for
| losslessness.
| iainmerrick wrote:
| "Slightly compressed" is heavily used in 3D graphics -- see
| ETC2, DXT, etc.
|
| It's useful because the size savings apply _in memory_ ,
| because decompression is so cheap it can be done on the fly.
|
| It has to be lossy because you want a guaranteed fixed-rate
| compression ratio (eg 4x) and there's no way to do that
| losslessly.
| makapuf wrote:
| for a good simple and fast image compressor look at DXT (decoded
| natively by graphics cards). No benchmarks for encoder
| performance though.
| st_goliath wrote:
| Out of curiosity, I modified the benchmark program to simply toss
| the pixel data into lz4/lz4hc for comparison (patch file:
| https://paste.debian.net/1220663/).
|
| I'm actually quite impressed how the resulting size is a little
| bit smaller than lz4hc (actually not even that far away from
| libpng), while the encoding speed is quite close to lz4, despite
| seemingly not having as many hacks and tweaks under the hood to
| get there. So the encoder is IMO doing amazingly well actually.
|
| However, the _decoding_ speed of liblz4 is off by roughly an
| order of magnitude. But again, liblz4 probably benefits from
| decades of tweaking & tuning to get there. It would be
| interesting how much performance could be gotten out of the qoi
| decoder.
|
| Here are the _totals_ I get for the "textures" benchmark
| samples: decode ms encode ms decode
| mpps encode mpps size kb libpng: 2.2 32.4
| 58.67 3.94 160 stbi: 2.1 17.0
| 61.50 7.49 228 qoi: 0.7 0.7
| 191.50 170.54 181 lz4: 0.1
| 0.6 1226.06 206.40 258 lz4hc: 0.1
| 70.9 1029.26 1.80 200
|
| And for the "wallpaper" samples. The spreads seem to be a bit
| more in favor of qoi here: decode ms
| encode ms decode mpps encode mpps size kb libpng:
| 131.9 2287.1 66.63 3.84 8648
| stbi: 147.5 1177.1 59.55 7.46
| 12468 qoi: 56.3 56.5 156.13
| 155.60 9981 lz4: 14.3 53.1
| 614.53 165.50 18019 lz4hc: 13.9
| 1901.7 630.94 4.62 12699
| injinj wrote:
| I believe libpng default zlib compression level is 6. libpng
| may use CRC (or adler) checksums, which account for around 30%
| of the decode time. In my experience with zlib, the higher
| compression levels have faster decode times. My reasoning at
| the time was (early '00s) that higher compression tends to code
| longer runs of copying of the dictionary. More time in the copy
| loop rather than the input and decode loop.
| zenogais wrote:
| Like the author said this is completely unoptimized. The
| natural next step in optimization might be to profile and then
| SIMD optimize the slow bits in compression and decompression.
| This would likely produce a significant speedup and may even
| bridge the gap with lz4.
| Trixter wrote:
| > liblz4 probably benefits from decades of tweaking & tuning to
| get there
|
| Decade, singular. LZ4 was proper around 2011. But the real
| reason it's faster is because its not doing nearly as much work
| as QOI. LZ4 is just about as asymmetric as you can get.
| willvarfar wrote:
| I believe (from quick code inspection) that the symmetry in
| encode/decode performance for QOI is because it has to generate
| the hash-table on decode too.
|
| Normal fast encoders use some kind of hash table or search to
| find matches, but encode the offset of the match in the source
| rather than in the table. QOI is encoding the offset into the
| table, which gives much shorter offsets but means the decoder
| has to maintain the table too.
|
| (The slow PPM compressors etc do maintain the same table in
| both encoder and decoder, and have symmetrical encode/decode
| performance too. See http://www.mattmahoney.net/dc/text.html)
|
| Its not really in the spirit of of QOI to add the complexity,
| but I'd imagine allowing the encoder to specify how big the
| hash table was would be only a small tweak, and encoding
| literal blocks instead of pixel-by-pixel will improve handling
| input that QOI can't compress better.
|
| I would be curious to know if planar helps or hinders. Perhaps
| even see what QOI makes of YUV etc. And I want to see 'heat
| maps' of encoded images, showing how cheap and expensive parts
| of them are, and which block type gets used. Yeah, going a bit
| beyond the spirit of QOI :D
|
| And from looking at the code I'm a bit confused how 3-channel
| is handled for literals because the alpha still looks like it
| gets encoded. Would have to walk through the code to understand
| that a bit better. Is it going to deref off the end of the
| source RGB image? Etc.
|
| (People interested in compression may be interested in the go-
| to forum for talking about it
| https://encode.su/threads/3753-QOI-(Quite-OK-Image-
| format)-l...)
| eisbaw wrote:
| What about qoi followed by lz4?
| renonce wrote:
| So it looks like the algorithm diffs neighbouring pixels, assume
| that the difference is likely to be small, and encodes the
| difference in 1, 2, 3 or 4 bytes. It does this without complex
| things like huffman trees and therefore got a major improvement
| in performance while approaching the compression ratio of zlib.
| That's interesting.
| yboris wrote:
| Obligatory comment about the new file format that's going to
| replace jpegs: _JPEG XL_
|
| https://jpegxl.info/
|
| It also has lossless setting (in addition to being better at
| compression when you want a lossy format).
| eisbaw wrote:
| I'd like to see this in xpra.
|
| Does anybody have an idea if QOI+lz4 would improve compression
| ratio further?
| wwalexander wrote:
| Are there compression algorithms that take longer than O(n)? My
| impression is that even two-pass encoding is O(n) with a large
| constant factor in front of it, but I've never actually thought
| this much about it.
| selcuka wrote:
| Some LZ variants such as LZ77 and LZW are O(n), but LZH, for
| example, is O(n*logn) because of Huffman encoding. I agree that
| O(n) is not revolutionary, but the simplicity of this algorithm
| makes it faster to execute.
| eutectic wrote:
| Isn't the log factor only in the alphabet size?
| bob1029 wrote:
| This is very impressive work.
|
| One thing I did recognize after my last trip into the rabbit hole
| - JPEG can be the fastest format by orders of magnitude if you
| are using an appropriate encoder. The project mentions 20x-50x
| speedups over PNG which are commendable, but once you throw
| LibJpegTurbo into the mix (it does use SIMD), you will find a
| substantially different equation on your hands. I can do 1080p
| 4:4:4 JPEG encodes in ~3ms on my workstation today.
|
| I am not saying you should always use jpeg (some people like to
| keep all their original pixels around), but if you want to go
| really _really_ fast you should consider it. The latency is so
| low with this approach that I have been looking at the
| feasibility of building a streaming gaming platform around it.
| Current status of that effort is somewhere between "holy shit it
| actually works" and "maybe?"
| bno1 wrote:
| Jpeg is not losses though
| dexen wrote:
| I am in awe with how the post neatly interleaves the why and the
| how. Having recently implemented L4 decompressor (itself a rather
| reasonable and "lightweight" standard), I like the QOI even
| better. Hoping to see the video decoder author mentions as
| possibility!
___________________________________________________________________
(page generated 2021-11-24 23:00 UTC)