[HN Gopher] The world's smallest PNG
___________________________________________________________________
The world's smallest PNG
Author : thunderbong
Score : 86 points
Date : 2024-02-02 08:46 UTC (1 days ago)
(HTM) web link (evanhahn.com)
(TXT) w3m dump (evanhahn.com)
| WhatsName wrote:
| I wonder if there is some leeway in common implementations to
| shave off even more bytes. Like php allows to omit the end tag.
| nayuki wrote:
| I could see implementations tolerating a lack of DEFLATE
| Adler-32, lack of PNG chunk CRC-32, and lack of PNG IEND chunk.
| tambourine_man wrote:
| Right, non valid, strictly speaking, yet perfectly parsed by
| most interpreters. That's where the magic happens
| basilgohar wrote:
| Good example, but in the case of PHP, emitting the ending tag
| is completely acceptable by the standard and is a common best
| practice as well, especially if your final content in the file
| is PHP code and not, for example, HTML.
|
| I think what the you're suggesting is more like stretching the
| rules and perhaps going outside of the standard a little, but
| in a way that most implementations would be able to handle.
| michaelt wrote:
| There's a real-world application for this sort of things - many
| raster slippy maps send represent empty ocean with carefully
| optimised single-colour PNG files [1]
|
| (of course, vector maps are a different matter)
|
| [1] https://www.mjt.me.uk/posts/smallest-png/
| zoky wrote:
| I once wrote a PL/pgSQL function to return a 1x1 BMP file given
| arbitrary R, G, and B values, to display a background color for
| FileMaker records. Still use it to this day as a matter of
| fact.
| zamadatix wrote:
| I wonder how many of these are actually carefully optimized
| single-colour PNG files vs... just PNG files. Going into
| Photopea, creating a 256x256 solid blue image, and hitting save
| gave me a 137 byte image. That'd be a 2nd place in that list.
| Doing the same in MS Paint gets 5th, beating out the last 2.
|
| What seems more likely to me is someone didn't consider the
| compression settings problem worth optimizing at all (the last
| 2) or put some work into making sure the images were generally
| encoded with decent settings (the first 4). After all,
| optimising your best case which happens to be one of the very
| few images that can be cached to be reused across multiple
| tiles to be a few bytes smaller is probably the least efficient
| way to spend your "efficiency increasing time".
|
| Similarly I wonder if saving a few bytes to encode a 1x1 you
| have the browser upscale is actually an anti-optimisation in
| that a few extra bytes transferred might be the faster overall
| method. Would need benchmarking to tell, but again it'd be
| benchmarking to test how to make the least problematic
| performance ever so slightly better.
| leeoniya wrote:
| now do it in QOI :)
|
| https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-co...
|
| https://news.ycombinator.com/item?id=34035024
| edflsafoiewq wrote:
| AFAICT the spec does not exclude zero-size images, which would
| make the smallest image 22 bytes (14 bytes header + 8 byte end
| marker).
|
| Otherwise the pixel in a 1x1 image can be encoded in 1 byte,
| giving 23 bytes total.
| lifthrasiir wrote:
| That's not too interesting and has been covered too many times
| (e.g. [1], [2]). Maybe it's much more interesting to compare
| the smallest possible file for a particular pattern like a 2x2
| B&W checkerboard.
|
| [1] https://shkspr.mobi/blog/2024/01/whats-the-smallest-file-
| siz...
|
| [2] https://github.com/mathiasbynens/small/
| nayuki wrote:
| This diagram would serve as a helpful companion to the article:
|
| https://www.nayuki.io/res/dumb-png-output-java/png-file-form...
|
| > I won't go in depth on DEFLATE here (in part because I am not
| an expert^4)
|
| > 4. If you are an expert, please contact me. I want to learn!!
|
| You already linked to "An Explanation of the DEFLATE Algorithm",
| which shows that you've been searching on the web already. I'm
| not sure if it'll make a difference, but here are a few more
| resources: https://www.euccas.me/zlib/ ,
| https://en.wikipedia.org/wiki/Deflate
| edflsafoiewq wrote:
| DEFLATE is basically huffman(lz(X)), which is obvious enough.
| The part that no one ever seems to motivate is how _precisely_
| you fit those together, ie. there is one tree for literals
| /length, another for distances, plus the extra bits.
| nayuki wrote:
| > how _precisely_ you fit those together
|
| I mean, read the DEFLATE spec; it's rather short compared to
| modern formats (LZMA, Brotli, Zstandard, etc.).
|
| The RFC version is "16 pages" long: https://www.rfc-
| editor.org/rfc/rfc1951.html . And here's my fancy HTML
| version: https://www.nayuki.io/page/deflate-
| specification-v1-3-html .
| edflsafoiewq wrote:
| I know _how_ it works. Like I said, I don 't know the
| motivation for why that is the best way to connect LZ and
| Huffman coding together.
| lifthrasiir wrote:
| I should note that it's hardly the best way, but it's
| easier to think DEFLATE as a layered algorithm: you catch
| repetitions via LZSS and code the remaining information
| with Huffman. You have two kinds of code because they
| have a very different distribution so it's beneficial to
| split them (and it's not surprising to have tens or
| hundreds of distributions in more sophiscated formats).
|
| And extra bits are there because longer distances in LZSS
| are typically opportunistic so individual values have a
| low frequency (i.e. Zipfian). So exact distance 1280 and
| 1281 can appear only once, but maybe distances 1200--1299
| appear frequently enough that you can have a distinct
| code for that plus two-digit adjustments. There are much
| more other ways to model distance distributions; for
| example, a set of codes for most recently used distances
| is common but DEFLATE is too old for that.
| snet0 wrote:
| At first I thought this was kind of pointless, but I actually
| think this is a decent way to get to the basic principles of a
| file type. A sort of minimal-working-example of a file, giving
| you an overview of how the file works and what is required to get
| something _valid_ , if uninteresting.
| systems_glitch wrote:
| Yeah, this is usually the kind of thing I end up implementing
| on a first pass for something where I don't want to just use a
| library (usually "for fun" hacking projects).
| dvh wrote:
| I use it to get rid of "favicon.ico not found" warning in
| console.
| beauHD wrote:
| 1TB favicons dumped in the root is an interesting problem for
| crawlers. Did they make the effort of dropping the connection
| in a timeout or proceed to gobble up the entire thing?
| Recursing wrote:
| See also: The Biggest Smallest PNG
| https://news.ycombinator.com/item?id=38908956
|
| And "The smallest 256x256 single-color PNG file, and where you've
| seen it" https://www.mjt.me.uk/posts/smallest-png/
| JKCalhoun wrote:
| > Every single PNG, including this one, starts with the same 8
| bytes.
|
| This might be common on other platforms as well, but I know that
| Apple uses a kind of series of tests to determine the type an
| image is rather than trusting the extension.
|
| As an experiment, change a PNG to have a .JPEG extension on MacOS
| and double-click. Preview, unsurprisingly, launches -- but if you
| get Info on the image in Preview it indicates it is a PNG. Of
| course.
| tiagod wrote:
| UNIX systems, like Mac OS and Linux, use file[0] (or libmagic
| behind it), to determine the format of the file. (or shebang,
| if the file has one.)
|
| [0] https://en.m.wikipedia.org/wiki/File_(command)
| wongarsu wrote:
| File extensions were the primary way to determine file type
| in VMS and CP/M. Consequently MS-DOS and Windows inherited
| that behavior.
|
| As you said, UNIX always went with heuristics based on the
| file content instead (and is the primary reason many file
| formats have fixed bytes to aid that process).
|
| And the web standarized on mime types transferred in the
| metadata, with how to set them correctly left as an exercise
| to the reader. Which usually means they are set based on file
| extension because that's faster.
| JKCalhoun wrote:
| > You have some wiggle room but chunks have a specific order. For
| example, the image metadata chunk has to appear before the pixel
| data chunk.
|
| Probably to make streaming easier. If you know the size early,
| HTML layout can anticipate the data that should eventually
| follow.
| sp332 wrote:
| Not only that, but so the decoder knows how much memory to
| allocate.
| elpocko wrote:
| MIF (my own image format) accepts an empty file as a valid 0x0
| image. It's very efficient :)
| mark-r wrote:
| Yes, but 0x0 isn't terribly useful. I think a 1x1 black (or
| transparent?) image would be better.
| adrianmonk wrote:
| 0x0 sounds pretty useful for web tracking pixels! It's even
| more unobtrusive than 1x1.
| mark-r wrote:
| Compare with the smallest GIF and other formats:
| https://news.ycombinator.com/item?id=38878480
| wongarsu wrote:
| The article kind of glosses over the PNG signature. There is
| actually a lot of thought behind using these exact 8 bytes:
|
| The first byte is 0x89, meant to tell any text editor that this
| isn't a text file, and to prevent text files from being
| recognized as PNG. It also detects if you mistakenly transferred
| the file in some ASCII file mode that clears the top bit.
|
| The next three bytes are PNG in ASCII, meant for humans looking
| at it in a hex editor.
|
| The next two bytes are a Windows-style line ending (CR LF, or 0D
| 0A). This breaks the signature if the file mistakenly undergoes
| Windows->Unix line conversion
|
| The next byte is 0x1A - End of File. This causes DOS tools to
| stop printing the file.
|
| The last byte is a Unix-style line break, to detect Unix->Windows
| line conversions
|
| To make the validation process complete, a process can check the
| integrity of the last chunk (IEND). Not only does it provide a
| fixed sequence of 12 bytes you can use to check that the file is
| complete, the fact that its length field is 0 allows you to
| verify that zero bytes were transferred correctly. Of course you
| would likely notice anyways because of the chunk checksums, but
| IEND allows you to fail fast and makes it easier to diagnose the
| actual issue.
|
| http://libpng.org/pub/png/spec/1.0/PNG-Rationale.html#R.PNG-...
| jraph wrote:
| Interesting, clever and beautiful. Now, why did they have to be
| so defensive against corruption? What purpose does it serve?
|
| If I had to detect corruption I would probably use something
| like CRC.
| MrCheeze wrote:
| So the one thing this article doesn't explicitly justify is
| whether the ten-byte zlib string is truly the shortest possible.
| You could imagine that it might be possible to hand-craft a
| DEFLATE block which is only three bytes instead of four, and yet
| still decompresses to at least two bytes (the minimum
| decompressed size of a PNG scanline).
|
| But by exhaustive search of all DEFLATE blocks that are one to
| three bytes in length, I can confirm the article is correct. All
| of them decompress to (at most) one byte of data.
|
| Which isn't a surprising result, but I wasn't actually sure of it
| before trying it out.
| jokoon wrote:
| I wonder if it's possible to detect features in an image, and
| encode those to compress an image, a bit like what image-to-SVG
| converters do, but better.
|
| Feature detection is a big subject, and I don't know if it would
| work for image compression.
___________________________________________________________________
(page generated 2024-02-03 23:00 UTC)