[HN Gopher] LZW and GIF explained
       ___________________________________________________________________
        
       LZW and GIF explained
        
       Author : networked
       Score  : 79 points
       Date   : 2024-05-24 07:43 UTC (1 days ago)
        
 (HTM) web link (www.eecis.udel.edu)
 (TXT) w3m dump (www.eecis.udel.edu)
        
       | mjevans wrote:
       | That's probably an OK writeup but seems focused on someone who's
       | good at learning without doodles and pictures. A visual /
       | kinesthetic variation would be nice, even if it's just ASCII art
       | of the data patterns.
        
         | duskwuff wrote:
         | A comment in the HTML source says this page was last modified
         | in 1997. I think I can forgive the author for not including
         | diagrams.
        
           | robinsonb5 wrote:
           | Yeah, .gif files weren't around back then. ;)
        
           | mjevans wrote:
           | Surely PRE blocks existed in 1997. "even if it's just ASCII
           | art of the data patterns."
        
       | salamo wrote:
       | I found out the other day that zstd supports dictionary
       | compression [0]. It's a pretty neat concept: you can "train" a
       | dictionary on your corpus of text, and use this dictionary to
       | compress subsequent documents. This is really useful if you have
       | a lot of small documents with a similar distribution of
       | characters. You only have to transmit the dictionary once over
       | the wire once, and then clients can get great compression ratios
       | after that.
       | 
       | For example, suppose you have a lot of chess PGNs you'd like to
       | store [1]. If you compress each PGN separately, you won't get
       | nearly as high of a compression ratio as if you stored them all
       | in the same file and compressed that file. But storing games
       | individually is more convenient. Here are the numbers:
       | 
       | Original PGN file: 780 bytes
       | 
       | Default zstd compression: 508 bytes (35% decrease)
       | 
       | With dictionary: 376 bytes (52% decrease)
       | 
       | I'm thinking about ways to compress at an even higher ratio.
       | Chess PGNs are a great candidate for this task because they have
       | a huge amount of redundant and repeated text.
       | 
       | [0] https://news.ycombinator.com/item?id=35917279
       | 
       | [1] https://en.wikipedia.org/wiki/Portable_Game_Notation
        
         | kevingadd wrote:
         | The shared dictionary technique you describe is presently used
         | for modern versions of HTTPS and for Brotli, IIRC. Very
         | effective.
        
           | follower wrote:
           | ...though with the slightly unexpected side effect (for
           | Brotli, at least) that your executable may end up containing
           | (~200KB, from memory) of _very_ unexpected[-1] plain text
           | strings which might ( & has[0]) lead to questions from
           | software end-users asking why your software contains
           | "random"[1] text (including potentially "culturally
           | sensitive" words/phrases related to religion such as "Holy
           | Roman Emperor", "Muslims", "dollars", "emacs"[2] or similar).
           | 
           | (I encountered this aspect while investigating potential size
           | optimization opportunities for the Godot game engine's
           | web/WASM builds--though presumably the Brotli dictionary
           | compresses well if the transfer encoding is... Brotli. :D )
           | 
           | ---- footnotes ----
           | 
           | [-1] An aspect also previously mentioned on HN:
           | https://news.ycombinator.com/item?id=27160590
           | 
           | [0] "This needs to be reviewed immediately #876":
           | https://github.com/google/brotli/issues/876
           | 
           | [1] Which, regardless of meaning, certainly bears
           | similarities to the type of "unexpected weird text"
           | commonly/normally associated with spam, malware, LLMs and
           | other entities of ill repute.
           | 
           | [2] The final example may not actually be factual. :)
        
             | RicoElectrico wrote:
             | Just XOR it with 0x55 ;)
        
               | follower wrote:
               | "Pros always use this one weird trick Brotli compression
               | hates." :)
        
       | follower wrote:
       | My personal favourite GIF-related explainer:
       | 
       | * "What's In A GIF":
       | https://www.matthewflickinger.com/lab/whatsinagif/
       | 
       | It's a five part explanation that includes both great
       | visualisations and bespoke analysis tools[2].
       | 
       | Here's a direct link to the section on LZW compression[0]:
       | 
       | * https://www.matthewflickinger.com/lab/whatsinagif/lzw_image_...
       | 
       | [0] Which somewhat confusingly credits "John Barkaus's LZW and
       | GIF Explained" which _actually_ turns out[1] to be... the text of
       | "LZW and GIF explained----Steve Blackstock" via a news group post
       | by John Barkaus (and is the text at OP's HN post URL).
       | 
       | [1]
       | http://web.archive.org/web/20050217131148/http://www.danbbs....
       | 
       | [2] Including this _fantastic_ example of visualizing individual
       | sections of a multi-part binary file format, the UX /UI of which
       | I think would be a great addition to a "hex viewer" application:
       | https://www.matthewflickinger.com/lab/whatsinagif/gif_explor...
        
       ___________________________________________________________________
       (page generated 2024-05-25 23:02 UTC)