[HN Gopher] Zlib visualizer
       ___________________________________________________________________
        
       Zlib visualizer
        
       Author : elisaado
       Score  : 301 points
       Date   : 2025-09-25 15:19 UTC (4 days ago)
        
 (HTM) web link (lynn.github.io)
 (TXT) w3m dump (lynn.github.io)
        
       | PeakKS wrote:
       | Damn I tried the bee movie script, they got me.
        
         | NooneAtAll3 wrote:
         | got how?
        
           | thewisenerd wrote:
           | if you paste contents which contain a very particular string
           | ("intends to sue the human race for stealing our honey"), the
           | contents are replaced with the phrase "the bee movie script?
           | really? how original"
        
             | Heliodex wrote:
             | Also got got. I assume the Bee Movie script is the first
             | choice for a lot of people needing an ad-hoc big block of
             | text. It also compresses pretty well.
             | 
             | https://github.com/lynn/flateview/blob/2668beaa5cc8cae387b6
             | f...
        
               | razster wrote:
               | How is that a thing? Guess I shall go down the rabbit
               | hole.
        
               | luca4 wrote:
               | Also for me the first time i hear about this. There goes
               | the next hour.
        
               | foofoo12 wrote:
               | Please report back, as many of us don't have an hour. We
               | rely on soldiers like you! Thanks!
        
               | luca4 wrote:
               | It's mostly what heliodex said, that it's a copypasta
               | when ppl need big text. There's also a compression meme
               | around the bee and other movies (on yt: bee movie in 10s
               | or it gets faster everytime X). But unlike i thought in
               | the beginning it's not a zlib specific joke.
        
               | rcxdude wrote:
               | Yeah, the movie became a bit of a meme at some point and
               | somehow shoehorning in "the entire bee movie script" into
               | random places became a part of that.
        
               | mid-kid wrote:
               | What happened to lorem ipsum?
        
               | thewisenerd wrote:
               | not easily compressible, i guess?
        
             | oneeyedpigeon wrote:
             | It should really normalise whitespace before that check,
             | because the version of the script I found split the line :)
        
             | ape4 wrote:
             | Um, sorry, I don't really get it. Is "the bee movie script?
             | really? how original" a comment?
        
               | zamadatix wrote:
               | It's the resulting string the tool gives instead of the
               | actual compressed string info. You can see the result
               | directly by putting some text which contains "intends to
               | sue the human race for stealing our honey" into the input
               | text box.
        
               | ape4 wrote:
               | Thanks. So only for this tool - not zlib normally?
        
               | zamadatix wrote:
               | Yes https://github.com/lynn/flateview/blob/2668beaa5cc8ca
               | e387b6f...
        
       | quuxplusone wrote:
       | s/Z-Lib/zlib/
       | 
       | I wonder if this can be blamed on the HN title auto-shortener or
       | not...
        
         | userbinator wrote:
         | I was expecting something about how many books they had, so
         | this was a funny surprise. I do wonder if the naming was a
         | deliberate attempt at hiding, much like naming a torrent
         | tracker after the sound made by a pig.
        
       | duskwuff wrote:
       | Two and a half issues:
       | 
       | 1) The handling of dynamic blocks leaves something to be desired.
       | The parameters are left mostly undecoded. It'd be really neat if
       | the Huffman symbols were listed somewhere, rather than just being
       | left implicit.
       | 
       | 2) The visualization falls apart pretty badly for texts
       | consisting of more than one block (which tends to happen around
       | 32 KB) - symbols are still decoded, but references all show up
       | blank.
       | 
       | Large inputs make the page hang for a bit, but that's probably
       | pretty hard to avoid.
       | 
       | And as an enhancement: it'd be really cool if clicking on
       | backreferences would jump to the text being referenced.
        
         | 0d0a wrote:
         | Exactly, it misses out on explaining how the fixed Huffman
         | table is interpreted to apply symbol and distance codes, or how
         | dynamic tables are derived from the input itself. Sure it's the
         | hardest part, but also the more interesting to visualize. As
         | another commenter pointed out, we are just left with mysterious
         | bit sequences for these codes.
         | 
         | It would be cool if we could supply our own Huffman table and
         | see how that affects the stream itself. We might want to put
         | our text right there! https://github.com/nevesnunes/deflate-
         | frolicking?tab=readme-...
        
           | cogman10 wrote:
           | I think this is something that makes a decent teaching aid
           | but doesn't work well for the uninitiated.
           | 
           | You need someone to spell out exactly what each of the
           | sections are and what they are doing.
        
       | Twirrim wrote:
       | As someone who's never really read that much on compression
       | stuff, I have absolutely zero clue what this visualisation is
       | actually showing me.
       | 
       | That's compounded by the lack of legend. What do the different
       | shades of blue and purple tell me? What is Orange?
       | 
       | e.g. on a given text in an orange block it puts e.g. x4<-135. x4
       | _seems_ to indicate that the first 4 binary values for the block
       | are important, but I can 't figure out what that 135 is
       | referencing (I assume it's some pointer to a value?)
        
         | lifthrasiir wrote:
         | It is a backreference, the main way of dealing with full or
         | partial repetitions in the LZ77 algorithm. It literally means:
         | copy 4 characters from the backward offset of 135. Note that
         | this "backward offset" can overlap previously repeated
         | characters, so x10<-1 equally means: copy the last character 10
         | times.
        
         | fwip wrote:
         | Using this example paragraph, at compression level 1 or higher
         | (copy with the quotation symbols):
         | 
         | "It was the best of times, it was the worst of times, it was
         | the age of wisdom, it was the age of foolishness, it was the
         | epoch of belief, it was the epoch of incredulity, it was the
         | season of light, it was the season of darkness, it was the
         | spring of hope, it was the winter of despair."
         | 
         | The red bit at the beginning is Zlib header information and
         | parameters. This basically tells the decoder the format of the
         | data coming up, how big the data is, etc.
         | 
         | The following grey section is the huffman coding tables - more
         | common characters in the input are encoded in a fewer number of
         | bits. This is what later tells the decoder, that 000 means 'e'
         | and 1110110 means 'I'.
         | 
         | Getting into the content now - this is where the decoder can
         | start emitting the uncompressed text. The first 3 purple
         | characters are the unicode values for the fancy opening quote -
         | because they're rare in this text, they're each encoded as 6 or
         | 7 bits. Because they take a lot of bits, this website is
         | showing them as a purple color, as well as physically wider.
         | The nearby 't' is encoded in 4 bits, 0110, and is represented
         | in a bluer color.
         | 
         | The orange bits you've mentioned are back references - "x10 <-
         | 26" here means "go back 26 characters in what you've decoded,
         | and then copy 10 characters again." In this way, we can
         | represent "t was the " in only 12 bits, because we've seen it
         | previously.
         | 
         | The grey at the end is a special "end of stream" marker,
         | followed by a red checksum which allows decoders to make sure
         | there wasn't any corruption in the input.
         | 
         | I think that's everything. Further reading:
         | https://en.wikipedia.org/wiki/Zlib
         | https://en.wikipedia.org/wiki/Deflate
         | https://en.wikipedia.org/wiki/Huffman_coding
        
       | ale42 wrote:
       | This is great! Just missing a way to understand how the
       | parameters are encoded, or is there something somewhere?
        
       | chmod775 wrote:
       | The byte counter seems broken somehow. "Compressing" a single
       | character with a compression level of 0 says "12 bytes", yet in
       | the visualization there's less than 8 bytes (~7.5).
       | 
       | When compressing with a level higher than 0, the bits also don't
       | appear to add up to a natural number of bytes, so I'm thinking
       | the visualization is missing some padding?
        
         | duskwuff wrote:
         | At least for me, compressing a single "a" at compression level
         | 0 gives me an output of 91 bits, which rounds up to 12 bytes.
        
           | chordbug wrote:
           | I fixed the bug after reading the comment you replied to :)
           | (I'm @lynn)
           | 
           | The LEN and NLEN items were not getting visualized.
        
       | jonjonsonjr wrote:
       | Something must be in the air. I've been working on a gzip/deflate
       | visualizer recently as well:
       | https://jonjohnsonjr.github.io/deflate/
       | 
       | This is very work in progress, but for folks looking for a deeper
       | explanation of how dynamic blocks are encoded, this is my attempt
       | to visualize them.
       | 
       | (This all happens locally with way too much wasm, so attempting
       | to upload a large gzip file will likely crash the tab.)
       | 
       | tl;dr for btype 2 blocks:
       | 
       | 3 bit block header.
       | 
       | Three values telling you how many extra (above the minimum
       | number) symbols are in each tree: HLIT, HDIST, and HCLEN.
       | 
       | First, we read (HCLEN + 4) * 3 bits.
       | 
       | These are the bit counts for symbols 0-18 in the code length
       | tree, which gives you the bit patterns for a little mini-language
       | used to compactly encode the literal/length and distance trees.
       | 0-15 are literal bit lengths (0 meaning it's omitted). 16 repeats
       | the previous symbol 3-6 times. 17 and 18 encode short (3-10) and
       | long (11-138) runs of zeroes, which is useful for encoding blocks
       | with sparse alphabets.
       | 
       | These bits counts are in a seemingly strange order that tries to
       | push less-likely bit counts towards the end of the list so it can
       | be truncated.
       | 
       | Knowing all the bit lengths for values in this alphabet allows
       | you to reconstruct a huffman tree (thanks to canonical huffman
       | codes) and decode the bit patterns for these code length codes.
       | 
       | That's followed by a bitstream that you decode to get the bit
       | counts for the literal/length and distance trees. HLIT and HDIST
       | (from earlier) tell you how many of these to expect.
       | 
       | Again, you can reconstruct these trees using just the bit lengths
       | thanks to canonical huffman codes, which gives you the bit
       | patterns for the data bitstream.
       | 
       | Then you just decode the rest of the bitstream (using LZSS) until
       | you hit 256, the end of block (EOB).
       | 
       | If you're not already familiar with deflate, don't be discouraged
       | if none of that made any sense. Bill Bird has an excellent (long)
       | lecture that I recommend to everyone:
       | https://www.youtube.com/watch?v=SJPvNi4HrWQ
        
       | DamonHD wrote:
       | I would really like to see one of these for brotli.
       | 
       | Also for zopfli vs level 9 compression with this tool as-is.
        
       ___________________________________________________________________
       (page generated 2025-09-29 23:01 UTC)