[HN Gopher] The world's smallest PNG
       ___________________________________________________________________
        
       The world's smallest PNG
        
       Author : thunderbong
       Score  : 86 points
       Date   : 2024-02-02 08:46 UTC (1 days ago)
        
 (HTM) web link (evanhahn.com)
 (TXT) w3m dump (evanhahn.com)
        
       | WhatsName wrote:
       | I wonder if there is some leeway in common implementations to
       | shave off even more bytes. Like php allows to omit the end tag.
        
         | nayuki wrote:
         | I could see implementations tolerating a lack of DEFLATE
         | Adler-32, lack of PNG chunk CRC-32, and lack of PNG IEND chunk.
        
         | tambourine_man wrote:
         | Right, non valid, strictly speaking, yet perfectly parsed by
         | most interpreters. That's where the magic happens
        
         | basilgohar wrote:
         | Good example, but in the case of PHP, emitting the ending tag
         | is completely acceptable by the standard and is a common best
         | practice as well, especially if your final content in the file
         | is PHP code and not, for example, HTML.
         | 
         | I think what the you're suggesting is more like stretching the
         | rules and perhaps going outside of the standard a little, but
         | in a way that most implementations would be able to handle.
        
       | michaelt wrote:
       | There's a real-world application for this sort of things - many
       | raster slippy maps send represent empty ocean with carefully
       | optimised single-colour PNG files [1]
       | 
       | (of course, vector maps are a different matter)
       | 
       | [1] https://www.mjt.me.uk/posts/smallest-png/
        
         | zoky wrote:
         | I once wrote a PL/pgSQL function to return a 1x1 BMP file given
         | arbitrary R, G, and B values, to display a background color for
         | FileMaker records. Still use it to this day as a matter of
         | fact.
        
         | zamadatix wrote:
         | I wonder how many of these are actually carefully optimized
         | single-colour PNG files vs... just PNG files. Going into
         | Photopea, creating a 256x256 solid blue image, and hitting save
         | gave me a 137 byte image. That'd be a 2nd place in that list.
         | Doing the same in MS Paint gets 5th, beating out the last 2.
         | 
         | What seems more likely to me is someone didn't consider the
         | compression settings problem worth optimizing at all (the last
         | 2) or put some work into making sure the images were generally
         | encoded with decent settings (the first 4). After all,
         | optimising your best case which happens to be one of the very
         | few images that can be cached to be reused across multiple
         | tiles to be a few bytes smaller is probably the least efficient
         | way to spend your "efficiency increasing time".
         | 
         | Similarly I wonder if saving a few bytes to encode a 1x1 you
         | have the browser upscale is actually an anti-optimisation in
         | that a few extra bytes transferred might be the faster overall
         | method. Would need benchmarking to tell, but again it'd be
         | benchmarking to test how to make the least problematic
         | performance ever so slightly better.
        
       | leeoniya wrote:
       | now do it in QOI :)
       | 
       | https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-co...
       | 
       | https://news.ycombinator.com/item?id=34035024
        
         | edflsafoiewq wrote:
         | AFAICT the spec does not exclude zero-size images, which would
         | make the smallest image 22 bytes (14 bytes header + 8 byte end
         | marker).
         | 
         | Otherwise the pixel in a 1x1 image can be encoded in 1 byte,
         | giving 23 bytes total.
        
         | lifthrasiir wrote:
         | That's not too interesting and has been covered too many times
         | (e.g. [1], [2]). Maybe it's much more interesting to compare
         | the smallest possible file for a particular pattern like a 2x2
         | B&W checkerboard.
         | 
         | [1] https://shkspr.mobi/blog/2024/01/whats-the-smallest-file-
         | siz...
         | 
         | [2] https://github.com/mathiasbynens/small/
        
       | nayuki wrote:
       | This diagram would serve as a helpful companion to the article:
       | 
       | https://www.nayuki.io/res/dumb-png-output-java/png-file-form...
       | 
       | > I won't go in depth on DEFLATE here (in part because I am not
       | an expert^4)
       | 
       | > 4. If you are an expert, please contact me. I want to learn!!
       | 
       | You already linked to "An Explanation of the DEFLATE Algorithm",
       | which shows that you've been searching on the web already. I'm
       | not sure if it'll make a difference, but here are a few more
       | resources: https://www.euccas.me/zlib/ ,
       | https://en.wikipedia.org/wiki/Deflate
        
         | edflsafoiewq wrote:
         | DEFLATE is basically huffman(lz(X)), which is obvious enough.
         | The part that no one ever seems to motivate is how _precisely_
         | you fit those together, ie. there is one tree for literals
         | /length, another for distances, plus the extra bits.
        
           | nayuki wrote:
           | > how _precisely_ you fit those together
           | 
           | I mean, read the DEFLATE spec; it's rather short compared to
           | modern formats (LZMA, Brotli, Zstandard, etc.).
           | 
           | The RFC version is "16 pages" long: https://www.rfc-
           | editor.org/rfc/rfc1951.html . And here's my fancy HTML
           | version: https://www.nayuki.io/page/deflate-
           | specification-v1-3-html .
        
             | edflsafoiewq wrote:
             | I know _how_ it works. Like I said, I don 't know the
             | motivation for why that is the best way to connect LZ and
             | Huffman coding together.
        
               | lifthrasiir wrote:
               | I should note that it's hardly the best way, but it's
               | easier to think DEFLATE as a layered algorithm: you catch
               | repetitions via LZSS and code the remaining information
               | with Huffman. You have two kinds of code because they
               | have a very different distribution so it's beneficial to
               | split them (and it's not surprising to have tens or
               | hundreds of distributions in more sophiscated formats).
               | 
               | And extra bits are there because longer distances in LZSS
               | are typically opportunistic so individual values have a
               | low frequency (i.e. Zipfian). So exact distance 1280 and
               | 1281 can appear only once, but maybe distances 1200--1299
               | appear frequently enough that you can have a distinct
               | code for that plus two-digit adjustments. There are much
               | more other ways to model distance distributions; for
               | example, a set of codes for most recently used distances
               | is common but DEFLATE is too old for that.
        
       | snet0 wrote:
       | At first I thought this was kind of pointless, but I actually
       | think this is a decent way to get to the basic principles of a
       | file type. A sort of minimal-working-example of a file, giving
       | you an overview of how the file works and what is required to get
       | something _valid_ , if uninteresting.
        
         | systems_glitch wrote:
         | Yeah, this is usually the kind of thing I end up implementing
         | on a first pass for something where I don't want to just use a
         | library (usually "for fun" hacking projects).
        
         | dvh wrote:
         | I use it to get rid of "favicon.ico not found" warning in
         | console.
        
           | beauHD wrote:
           | 1TB favicons dumped in the root is an interesting problem for
           | crawlers. Did they make the effort of dropping the connection
           | in a timeout or proceed to gobble up the entire thing?
        
       | Recursing wrote:
       | See also: The Biggest Smallest PNG
       | https://news.ycombinator.com/item?id=38908956
       | 
       | And "The smallest 256x256 single-color PNG file, and where you've
       | seen it" https://www.mjt.me.uk/posts/smallest-png/
        
       | JKCalhoun wrote:
       | > Every single PNG, including this one, starts with the same 8
       | bytes.
       | 
       | This might be common on other platforms as well, but I know that
       | Apple uses a kind of series of tests to determine the type an
       | image is rather than trusting the extension.
       | 
       | As an experiment, change a PNG to have a .JPEG extension on MacOS
       | and double-click. Preview, unsurprisingly, launches -- but if you
       | get Info on the image in Preview it indicates it is a PNG. Of
       | course.
        
         | tiagod wrote:
         | UNIX systems, like Mac OS and Linux, use file[0] (or libmagic
         | behind it), to determine the format of the file. (or shebang,
         | if the file has one.)
         | 
         | [0] https://en.m.wikipedia.org/wiki/File_(command)
        
           | wongarsu wrote:
           | File extensions were the primary way to determine file type
           | in VMS and CP/M. Consequently MS-DOS and Windows inherited
           | that behavior.
           | 
           | As you said, UNIX always went with heuristics based on the
           | file content instead (and is the primary reason many file
           | formats have fixed bytes to aid that process).
           | 
           | And the web standarized on mime types transferred in the
           | metadata, with how to set them correctly left as an exercise
           | to the reader. Which usually means they are set based on file
           | extension because that's faster.
        
       | JKCalhoun wrote:
       | > You have some wiggle room but chunks have a specific order. For
       | example, the image metadata chunk has to appear before the pixel
       | data chunk.
       | 
       | Probably to make streaming easier. If you know the size early,
       | HTML layout can anticipate the data that should eventually
       | follow.
        
         | sp332 wrote:
         | Not only that, but so the decoder knows how much memory to
         | allocate.
        
       | elpocko wrote:
       | MIF (my own image format) accepts an empty file as a valid 0x0
       | image. It's very efficient :)
        
         | mark-r wrote:
         | Yes, but 0x0 isn't terribly useful. I think a 1x1 black (or
         | transparent?) image would be better.
        
           | adrianmonk wrote:
           | 0x0 sounds pretty useful for web tracking pixels! It's even
           | more unobtrusive than 1x1.
        
       | mark-r wrote:
       | Compare with the smallest GIF and other formats:
       | https://news.ycombinator.com/item?id=38878480
        
       | wongarsu wrote:
       | The article kind of glosses over the PNG signature. There is
       | actually a lot of thought behind using these exact 8 bytes:
       | 
       | The first byte is 0x89, meant to tell any text editor that this
       | isn't a text file, and to prevent text files from being
       | recognized as PNG. It also detects if you mistakenly transferred
       | the file in some ASCII file mode that clears the top bit.
       | 
       | The next three bytes are PNG in ASCII, meant for humans looking
       | at it in a hex editor.
       | 
       | The next two bytes are a Windows-style line ending (CR LF, or 0D
       | 0A). This breaks the signature if the file mistakenly undergoes
       | Windows->Unix line conversion
       | 
       | The next byte is 0x1A - End of File. This causes DOS tools to
       | stop printing the file.
       | 
       | The last byte is a Unix-style line break, to detect Unix->Windows
       | line conversions
       | 
       | To make the validation process complete, a process can check the
       | integrity of the last chunk (IEND). Not only does it provide a
       | fixed sequence of 12 bytes you can use to check that the file is
       | complete, the fact that its length field is 0 allows you to
       | verify that zero bytes were transferred correctly. Of course you
       | would likely notice anyways because of the chunk checksums, but
       | IEND allows you to fail fast and makes it easier to diagnose the
       | actual issue.
       | 
       | http://libpng.org/pub/png/spec/1.0/PNG-Rationale.html#R.PNG-...
        
         | jraph wrote:
         | Interesting, clever and beautiful. Now, why did they have to be
         | so defensive against corruption? What purpose does it serve?
         | 
         | If I had to detect corruption I would probably use something
         | like CRC.
        
       | MrCheeze wrote:
       | So the one thing this article doesn't explicitly justify is
       | whether the ten-byte zlib string is truly the shortest possible.
       | You could imagine that it might be possible to hand-craft a
       | DEFLATE block which is only three bytes instead of four, and yet
       | still decompresses to at least two bytes (the minimum
       | decompressed size of a PNG scanline).
       | 
       | But by exhaustive search of all DEFLATE blocks that are one to
       | three bytes in length, I can confirm the article is correct. All
       | of them decompress to (at most) one byte of data.
       | 
       | Which isn't a surprising result, but I wasn't actually sure of it
       | before trying it out.
        
       | jokoon wrote:
       | I wonder if it's possible to detect features in an image, and
       | encode those to compress an image, a bit like what image-to-SVG
       | converters do, but better.
       | 
       | Feature detection is a big subject, and I don't know if it would
       | work for image compression.
        
       ___________________________________________________________________
       (page generated 2024-02-03 23:00 UTC)