[HN Gopher] Zlib visualizer
___________________________________________________________________
Zlib visualizer
Author : elisaado
Score : 301 points
Date : 2025-09-25 15:19 UTC (4 days ago)
(HTM) web link (lynn.github.io)
(TXT) w3m dump (lynn.github.io)
| PeakKS wrote:
| Damn I tried the bee movie script, they got me.
| NooneAtAll3 wrote:
| got how?
| thewisenerd wrote:
| if you paste contents which contain a very particular string
| ("intends to sue the human race for stealing our honey"), the
| contents are replaced with the phrase "the bee movie script?
| really? how original"
| Heliodex wrote:
| Also got got. I assume the Bee Movie script is the first
| choice for a lot of people needing an ad-hoc big block of
| text. It also compresses pretty well.
|
| https://github.com/lynn/flateview/blob/2668beaa5cc8cae387b6
| f...
| razster wrote:
| How is that a thing? Guess I shall go down the rabbit
| hole.
| luca4 wrote:
| Also for me the first time i hear about this. There goes
| the next hour.
| foofoo12 wrote:
| Please report back, as many of us don't have an hour. We
| rely on soldiers like you! Thanks!
| luca4 wrote:
| It's mostly what heliodex said, that it's a copypasta
| when ppl need big text. There's also a compression meme
| around the bee and other movies (on yt: bee movie in 10s
| or it gets faster everytime X). But unlike i thought in
| the beginning it's not a zlib specific joke.
| rcxdude wrote:
| Yeah, the movie became a bit of a meme at some point and
| somehow shoehorning in "the entire bee movie script" into
| random places became a part of that.
| mid-kid wrote:
| What happened to lorem ipsum?
| thewisenerd wrote:
| not easily compressible, i guess?
| oneeyedpigeon wrote:
| It should really normalise whitespace before that check,
| because the version of the script I found split the line :)
| ape4 wrote:
| Um, sorry, I don't really get it. Is "the bee movie script?
| really? how original" a comment?
| zamadatix wrote:
| It's the resulting string the tool gives instead of the
| actual compressed string info. You can see the result
| directly by putting some text which contains "intends to
| sue the human race for stealing our honey" into the input
| text box.
| ape4 wrote:
| Thanks. So only for this tool - not zlib normally?
| zamadatix wrote:
| Yes https://github.com/lynn/flateview/blob/2668beaa5cc8ca
| e387b6f...
| quuxplusone wrote:
| s/Z-Lib/zlib/
|
| I wonder if this can be blamed on the HN title auto-shortener or
| not...
| userbinator wrote:
| I was expecting something about how many books they had, so
| this was a funny surprise. I do wonder if the naming was a
| deliberate attempt at hiding, much like naming a torrent
| tracker after the sound made by a pig.
| duskwuff wrote:
| Two and a half issues:
|
| 1) The handling of dynamic blocks leaves something to be desired.
| The parameters are left mostly undecoded. It'd be really neat if
| the Huffman symbols were listed somewhere, rather than just being
| left implicit.
|
| 2) The visualization falls apart pretty badly for texts
| consisting of more than one block (which tends to happen around
| 32 KB) - symbols are still decoded, but references all show up
| blank.
|
| Large inputs make the page hang for a bit, but that's probably
| pretty hard to avoid.
|
| And as an enhancement: it'd be really cool if clicking on
| backreferences would jump to the text being referenced.
| 0d0a wrote:
| Exactly, it misses out on explaining how the fixed Huffman
| table is interpreted to apply symbol and distance codes, or how
| dynamic tables are derived from the input itself. Sure it's the
| hardest part, but also the more interesting to visualize. As
| another commenter pointed out, we are just left with mysterious
| bit sequences for these codes.
|
| It would be cool if we could supply our own Huffman table and
| see how that affects the stream itself. We might want to put
| our text right there! https://github.com/nevesnunes/deflate-
| frolicking?tab=readme-...
| cogman10 wrote:
| I think this is something that makes a decent teaching aid
| but doesn't work well for the uninitiated.
|
| You need someone to spell out exactly what each of the
| sections are and what they are doing.
| Twirrim wrote:
| As someone who's never really read that much on compression
| stuff, I have absolutely zero clue what this visualisation is
| actually showing me.
|
| That's compounded by the lack of legend. What do the different
| shades of blue and purple tell me? What is Orange?
|
| e.g. on a given text in an orange block it puts e.g. x4<-135. x4
| _seems_ to indicate that the first 4 binary values for the block
| are important, but I can 't figure out what that 135 is
| referencing (I assume it's some pointer to a value?)
| lifthrasiir wrote:
| It is a backreference, the main way of dealing with full or
| partial repetitions in the LZ77 algorithm. It literally means:
| copy 4 characters from the backward offset of 135. Note that
| this "backward offset" can overlap previously repeated
| characters, so x10<-1 equally means: copy the last character 10
| times.
| fwip wrote:
| Using this example paragraph, at compression level 1 or higher
| (copy with the quotation symbols):
|
| "It was the best of times, it was the worst of times, it was
| the age of wisdom, it was the age of foolishness, it was the
| epoch of belief, it was the epoch of incredulity, it was the
| season of light, it was the season of darkness, it was the
| spring of hope, it was the winter of despair."
|
| The red bit at the beginning is Zlib header information and
| parameters. This basically tells the decoder the format of the
| data coming up, how big the data is, etc.
|
| The following grey section is the huffman coding tables - more
| common characters in the input are encoded in a fewer number of
| bits. This is what later tells the decoder, that 000 means 'e'
| and 1110110 means 'I'.
|
| Getting into the content now - this is where the decoder can
| start emitting the uncompressed text. The first 3 purple
| characters are the unicode values for the fancy opening quote -
| because they're rare in this text, they're each encoded as 6 or
| 7 bits. Because they take a lot of bits, this website is
| showing them as a purple color, as well as physically wider.
| The nearby 't' is encoded in 4 bits, 0110, and is represented
| in a bluer color.
|
| The orange bits you've mentioned are back references - "x10 <-
| 26" here means "go back 26 characters in what you've decoded,
| and then copy 10 characters again." In this way, we can
| represent "t was the " in only 12 bits, because we've seen it
| previously.
|
| The grey at the end is a special "end of stream" marker,
| followed by a red checksum which allows decoders to make sure
| there wasn't any corruption in the input.
|
| I think that's everything. Further reading:
| https://en.wikipedia.org/wiki/Zlib
| https://en.wikipedia.org/wiki/Deflate
| https://en.wikipedia.org/wiki/Huffman_coding
| ale42 wrote:
| This is great! Just missing a way to understand how the
| parameters are encoded, or is there something somewhere?
| chmod775 wrote:
| The byte counter seems broken somehow. "Compressing" a single
| character with a compression level of 0 says "12 bytes", yet in
| the visualization there's less than 8 bytes (~7.5).
|
| When compressing with a level higher than 0, the bits also don't
| appear to add up to a natural number of bytes, so I'm thinking
| the visualization is missing some padding?
| duskwuff wrote:
| At least for me, compressing a single "a" at compression level
| 0 gives me an output of 91 bits, which rounds up to 12 bytes.
| chordbug wrote:
| I fixed the bug after reading the comment you replied to :)
| (I'm @lynn)
|
| The LEN and NLEN items were not getting visualized.
| jonjonsonjr wrote:
| Something must be in the air. I've been working on a gzip/deflate
| visualizer recently as well:
| https://jonjohnsonjr.github.io/deflate/
|
| This is very work in progress, but for folks looking for a deeper
| explanation of how dynamic blocks are encoded, this is my attempt
| to visualize them.
|
| (This all happens locally with way too much wasm, so attempting
| to upload a large gzip file will likely crash the tab.)
|
| tl;dr for btype 2 blocks:
|
| 3 bit block header.
|
| Three values telling you how many extra (above the minimum
| number) symbols are in each tree: HLIT, HDIST, and HCLEN.
|
| First, we read (HCLEN + 4) * 3 bits.
|
| These are the bit counts for symbols 0-18 in the code length
| tree, which gives you the bit patterns for a little mini-language
| used to compactly encode the literal/length and distance trees.
| 0-15 are literal bit lengths (0 meaning it's omitted). 16 repeats
| the previous symbol 3-6 times. 17 and 18 encode short (3-10) and
| long (11-138) runs of zeroes, which is useful for encoding blocks
| with sparse alphabets.
|
| These bits counts are in a seemingly strange order that tries to
| push less-likely bit counts towards the end of the list so it can
| be truncated.
|
| Knowing all the bit lengths for values in this alphabet allows
| you to reconstruct a huffman tree (thanks to canonical huffman
| codes) and decode the bit patterns for these code length codes.
|
| That's followed by a bitstream that you decode to get the bit
| counts for the literal/length and distance trees. HLIT and HDIST
| (from earlier) tell you how many of these to expect.
|
| Again, you can reconstruct these trees using just the bit lengths
| thanks to canonical huffman codes, which gives you the bit
| patterns for the data bitstream.
|
| Then you just decode the rest of the bitstream (using LZSS) until
| you hit 256, the end of block (EOB).
|
| If you're not already familiar with deflate, don't be discouraged
| if none of that made any sense. Bill Bird has an excellent (long)
| lecture that I recommend to everyone:
| https://www.youtube.com/watch?v=SJPvNi4HrWQ
| DamonHD wrote:
| I would really like to see one of these for brotli.
|
| Also for zopfli vs level 9 compression with this tool as-is.
___________________________________________________________________
(page generated 2025-09-29 23:01 UTC)