[HN Gopher] Finally Understanding PNG (2017)
___________________________________________________________________
Finally Understanding PNG (2017)
Author : Asdrubalini
Score : 117 points
Date : 2021-05-26 08:42 UTC (14 hours ago)
(HTM) web link (compress-or-die.com)
(TXT) w3m dump (compress-or-die.com)
| nayuki wrote:
| This article talks about understanding the PNG pixel data and
| compression, which is wonderful. Recently I made a tool to
| understand another aspect of the PNG file format, which is the
| full list of chunks, metadata chunks, and chunk fields.
| https://www.nayuki.io/page/png-file-chunk-inspector
| acqq wrote:
| I'm old enough to find that it must be noted: "HDD" was _never_
| used for diskettes, even the "high density" ones.
|
| HDD:
|
| https://en.wikipedia.org/wiki/Hard_disk_drive
|
| Diskettes:
|
| https://en.wikipedia.org/wiki/Floppy_disk#3+1%E2%81%842-inch...
|
| I'm commenting "HDD" as written in: "1.44 MB. This is also pretty
| much the size of an uncompressed BMP image (and incidentally the
| size of a 3.5 " HDD ... for all who can still remember" in the
| current text of the article.
| dahart wrote:
| Oh that's funny, I am also old enough, but I'd have sworn I
| used floppy disks that had "HDD" on them. I think you're right
| though. All the pics are showing "2HD" and the more rare "3HD".
| There were disks with "High Density Diskette" printed on them,
| and initialisms "HD" and "DD" (double density). But I can't
| find any examples of HDD outside of hard disks.
| H8crilA wrote:
| Low quality/tangential comment:
|
| It was a _floppy_ disk, not a _hard_ disk. The magnetic surface
| in an FDD disk is floppy the magnetic surface in a HDD is hard
| (rigid).
| afavour wrote:
| > The magnetic surface in an FDD disk is floppy the magnetic
| surface in a HDD is hard
|
| Maybe I'm wrong but I was under the impression that a
| "floppy" was called that because the original ones - 5.25",
| _were_ floppy.
| dahart wrote:
| I think that is true, but should be noted that the
| _original_ ones were 8 inches. :)
| https://en.wikipedia.org/wiki/Floppy_disk
| pansa2 wrote:
| > _the original ones - 5.25 "_
|
| Nitpick: The first floppy disks were 8" in size.
| H8crilA wrote:
| The 3.5" also had a floppy disk inside, only surrounded
| with a plastic rigid case.
| lonelygirl15a wrote:
| "Floppy" refers to the media, not the packaging; some
| (3.5") had harder plastic shells, but inside was the same
| floppy plastic material. Hard disk platters were/are rigid
| (metal or glass).
| dahart wrote:
| Yes that became true after 3.5 inch disks with hard
| shells came out and were called "floppy", but before that
| all the floppy disks had soft outer shells, and I believe
| disks were named floppy for exactly that reason, not
| because the inner media was floppy.
| tym0 wrote:
| Even the 3.5" were floppy inside, they just had a hard case
| around it.
| UglyToad wrote:
| At some point I'm going to get round to adding heuristic filter
| selection to my PNG library BigGustave [0]. I have a TODO note
| for how to start implementing it.
|
| Though to be honest using Palette for colors tends to produce a
| huge saving for a lot of image types (those with 256 or fewer
| colors, so documents, not natural scenes) making filter selection
| a small added bonus.
|
| [0]:
| https://github.com/EliotJones/BigGustave/blob/master/src/Big...
| ericpauley wrote:
| Nit: it was somewhat disingenuous that the first two images
| aren't actually encoding the same results, since noise was added
| to the right image to accentuate the paletting effects. If the
| image on the right didn't have noise the effect would be far less
| pronounced.
| Hamuko wrote:
| Fun PNG fact: the official pronunciation of "PNG" is actually
| "ping": http://www.libpng.org/pub/png/spec/iso/index-
| object.html#1Sc...
|
| I imagine there won't be a mainstream culture war over it like
| there is with GIF and JIF though.
| drsopp wrote:
| > A PNG is compressed in three steps: pre-filtering, dictionary-
| based coding via LZ77, and finally entropy coding according to a
| certain Mr. Huffman.
|
| Huffman was a Doctor of Science.
|
| Also, (2017).
|
| Edit: Couldn't OP's email. I don't have Twitter.
| TonyTrapp wrote:
| > These filters are the reason why the PNG is a nose ahead of the
| GIF format and is therefore almost always smaller
|
| Not really. I found that for many many many pictures, not using a
| filter turns out to be the smallest option. It's really just that
| DEFLATE is much better than LZW as used by GIF.
| masklinn wrote:
| > This is due to the fact that the specification of the PNG has a
| sliding window of 32 kB for performance reasons.
|
| It's a bit of a shortcut: in reality PNG uses DEFLATE, and it's
| DEFLATE which has a 32kB window.
|
| And it's not exactly performance reasons but memory budget,
| especially as deflate was designed for _streaming_ , both when
| compressing and decompressing: when decompressing you need a
| buffer at least as large as the window, and when compressing you
| need that plus most likely an index of some sort (so you don't
| have to look for matches linearly through the buffer).
|
| This is more relevant with the context that Phil Katz created
| deflate in 1991, high-end desktops reached high single-digit
| megabytes of RAM but the low end might not even be there (Windows
| 3.0 required 1MB to run decently).
|
| By comparison, if memory serves zstd uses a window of 8MB (or 16
| depending on setting), though configurable anywhere from 1k to
| 512MB
| userbinator wrote:
| ...and 32k allows the buffer to fit into a single 64k segment
| along with the Huffman tables, no doubt a useful property at
| the time. Deflate is actually very similar to LZH, and there
| are small extensions to both which make the window 64k; but in
| my experience it doesn't provide all that much extra
| compression (few percent.)
| magicalhippo wrote:
| > no doubt a useful property at the time
|
| For reference: https://en.wikipedia.org/wiki/X86_memory_segme
| ntation#Real_m...
| jandrese wrote:
| I find it amusing when reading the docs on compression programs
| and seeing statements like "Compression level of -9 should be
| used with caution as it can consume over 8 megabytes of main
| memory".
| blowski wrote:
| I learned a lot from that, thank you.
|
| I wonder how much bandwidth I've wasted for people over the years
| because I've saved poorly optimised versions from Photoshop. Are
| there automated tools that can be included in a CI pipeline to
| help with this?
| chris_overseas wrote:
| Squoosh (https://github.com/GoogleChromeLabs/squoosh) gives
| very impressive results in my (admittedly somewhat limited)
| experience.
| willis936 wrote:
| For graphics with only a few colors you can save a few extra
| bits by moving to lower bit depths. PNG explicitly supports an
| 8-bit palette color mode that is pretty lean. You can even
| specify fewer than 8 bpp if your palette is small enough to
| have its cardinality fall within lower powers of 2.
|
| I recall PNG8 not being well supported in some applications.
| I've run into this with game engines, which I found strange
| since asset size is an important thing to keep in mind.
| giantrobot wrote:
| IE6 and below had no end of trouble with PNG transparency.
| IIRC it only supported transparency correctly on PNG8 images
| but only in img tags, not backgrounds, and ignored the alpha
| channel entirely on PNG32.
|
| There were some IE-specific hacks you could use with their
| proprietary CSS extensions but they were really annoying. I
| think IE7 finally fixed the transparency bugs but since there
| was still a huge IE6 cohort it was still a crapshoot relying
| on transparent PNGs.
| ogurechny wrote:
| I think the first step of any optimization strategy is to
| understand what you are doing, and stating the goal that can
| clearly be either reached or not. Size crunching on a couple of
| images is fun, valiantly processing a sizable collection of
| pictures with optipng makes you realize that your CPU(s) are
| glowing from heat for hours just to take 10 bytes off the file
| that still fills the same amount of disk sectors. That's even
| bigger waste of energy than mining.
|
| One particular trick is getting images with visually low number
| of distinct colors, like logos and screenshots, through
| pngquant. You don't have to worry whether the actual number of
| colors is less than 256, how to posterize or filter the image
| to decrease that number, or how to suppress artificial colors
| from font anti-aliasing -- the tool measures the visual
| likeness, and provides the means to set the acceptable limit so
| it would leave unsuitable images intact. Obviously, when you're
| dealing with some long-term archival storage, or assets for
| further editing like stock clipart, such presentation-oriented
| lossy optimization should not be used.
|
| Other automatic tools are referenced in the article, though you
| should treat them as a nice sanity check that prevents images
| holding raw uncompressed data or simple two color graphs saved
| as 4 byte RGBA from appearing in your project, and not a
| magical solution that would send it to the sky performance-
| wise.
|
| Bandwidth (and loading time) is usually wasted on complex high
| resolution images like photos or combined graphics with
| photographic details. These can't really be optimized well
| enough in PNG with naive lossless or lossy methods, and have to
| be converted to JPEG to reach sane size or regenerated as
| independent layers that use different compression.
| karim79 wrote:
| The https://kraken.io service has an API, and there's a gulp
| and grunt plugin and a bunch of other stuff here:
| https://kraken.io/docs/integration-libraries
| djxfade wrote:
| Kraken never answers any support requests, and I have
| experienced outages without any information. I would not rely
| on them.
| blowski wrote:
| That looks perfect, except for it being a SaaS product with a
| minimum monthly spend. Is there a similar pay-as-you-go
| model, or ideally self-hosted?
| karim79 wrote:
| Good question - not at the moment, but we're evaluating the
| idea. Perhaps the answer would be to just make the build
| pipeline tools free for developers to a certain limit.
| aargh_aargh wrote:
| Wait, why not optipng [1], pngcrush [2] or any other of the
| handful of locally running lossless PNG optimizers [3]?
| [1] https://packages.debian.org/unstable/optipng [2]
| https://packages.debian.org/unstable/pngcrush [3]
| http://www.olegkikin.com/png_optimizers/
| blowski wrote:
| Do they do what the article is recommending? Genuine
| question, I don't know the answer.
| TonyTrapp wrote:
| They all optimize PNGs in different ways. For some files
| one tool is the best, for others another tool. There are
| even meta-tools like PNGGauntlet that just brute-force
| the compression with several of these tools (e.g. pngout
| and optipng) at the same time. That way, you can be
| relatively sure that your file will be close to being the
| smallest possible.
| sumtechguy wrote:
| If you are in the windows world I have been using this. h
| ttps://sourceforge.net/projects/nikkhokkho/files/FileOpti
| mi...
|
| Which contains all of those and just tries them all. Been
| meaning to pop it all out and make a bash script of them
| to automate some of my stuff on a cron job instead of
| using a GUI.
| virtue3 wrote:
| I've always just used this (it's more old school but def
| solid):
|
| https://pmt.sourceforge.io/pngcrush/
| sprash wrote:
| As additional strategy for reducing PNG sizes losslessly not
| mentioned in the article is the brute force optimization of the
| DEFLATE compression algorithm (via zopflipng). This gives you
| usually a few percent improvement for "free".
| huachimingo wrote:
| Pico-8 has its cartridge format on a custom PNG.
| userbinator wrote:
| _Finding the optimal filter for each line is a science in itself,
| which, of course, could be approached with brute computer power.
| But since you probably do not want to wait a week in front of
| your computer, until the PNG is finally calculated, the filters
| are selected by experience values and assumptions._
|
| That paragraph makes it sound like this article was written two
| decades ago. pngcrush, which does bruteforce, still takes only a
| minute or two on average (less than screen-sized) files.
| nayuki wrote:
| PNG defines 5 possible filter methods per row: 0 = None, 1 =
| Sub, 2 = Up, 3 = Average, 4 = Paeth. Brute-forcing the filter
| search is exponential time at Th(5^imageHeight).
| https://www.w3.org/TR/2003/REC-PNG-20031110/#9Filters
| tialaramex wrote:
| pngcrush doesn't actually brute force, for _exactly the reason
| the article gives_ , it would take too long.
|
| Instead, pngcrush is using yet more heuristics to guess what
| else is likely to work and then trying some of the other
| options that wouldn't be picked by the defaults.
|
| By default it only tries a few approaches that it deems _most
| likely_ to work, but even with the "brute" parameter it does
| not truly try everything, again, that would take too long, and
| pngcrush "brute" is already very slow, it just tries all the
| methods it would ever have used on any image.
|
| Actual brute force of a relatively tiny 32x32 image would
| require trying all five filters for each line _in combination_
| with each such filter for each other line or else you might
| miss a combination which, astonishingly, was smaller even
| though it wasn 't immediately obvious why. I think that's 5
| raised to the power 32 combinations.
| magicalhippo wrote:
| It could also use a look-ahead scheme. That is, cloning the
| compression state, compressing the current line for each of
| the different filters, retaining the state that lead to the
| highest compression and discarding the others.
|
| This could be trivially extended to multiple lines of look-
| ahead, though at exponential memory and CPU cost.
|
| Probably dog slow though.
| tialaramex wrote:
| The simplest of these schemes is pretty fast, only five
| times slower than the naive approach. _But_ it absolutely
| doesn 't give you the benefit from trying all the
| combinations and I expect it will produce worse results
| than existing tools.
|
| The problem is, while you know which outputs compressed
| best _so far_ you can 't be sure which will compress best
| in combination with future outputs you haven't as yet
| filtered.
|
| Imagine for example that one filter for the current row
| gives what seems like a random sequence that doesn't
| compress very well. Your proposed scheme rejects this
| filter, it compressed poorly, so don't use it. But, what if
| it turns out that every future row can be compressed to the
| _exact same sequence_ with some other filter? Deflate can
| compress the rows _extremely well_ together, even though
| the first one on its own did not compress.
| userbinator wrote:
| I haven't looked into the theory of it much, but wouldn't
| the local minimum also be the global one?
|
| In your example, even if the first row is more poorly
| compressed with one filter, could future rows "catch up"
| the loss? Unless you have a specific counterexample, I
| don't think it can, because the filtration operates
| before compression and there's no feedback as such.
| tialaramex wrote:
| > I don't think it can, because the filtration operates
| before compression and there's no feedback as such.
|
| The compressor consumes the _entire_ filtered data. You
| may have been thinking it goes something like
|
| Compress(Filter(row1,method=2)).Compress(Filter(row2,meth
| od=4)).Compress(Filter(row3,method=1)) and so on.
|
| But that would be terribly wasteful. What they do is
| actually closer to
|
| Compress(Filter(row1,method=2).Filter(row2,method=4).Filt
| er(row3,method=1) and so on
|
| Does that make it clearer that if the _results_ of two
| Filter steps are more or less similar, even if those
| results compress badly on their own, it helps them
| compress _better_ when they 're both nearby ?
| magicalhippo wrote:
| Obviously local approaches can fail to find the global
| minimum. This is mitigated somewhat though I imagine by
| the limited 32kB window deflate has.
|
| Would be a fun weekend project...
| [deleted]
___________________________________________________________________
(page generated 2021-05-26 23:02 UTC)