[HN Gopher] WebP: The WebPage Compression Format
___________________________________________________________________
WebP: The WebPage Compression Format
Author : Kubuxu
Score : 493 points
Date : 2024-09-07 17:32 UTC (1 days ago)
(HTM) web link (purplesyringa.moe)
(TXT) w3m dump (purplesyringa.moe)
| next_xibalba wrote:
| If only we hadn't lost Jan Sloot's Digital Coding System [1],
| we'd be able to transmit GB in milliseconds across the web!
|
| [1] https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System
| supriyo-biswas wrote:
| This claim itself is probably a hoax and not relevant to the
| article at hand; but these days with text-to-image models and
| browser support, you could probably do something like <img
| prompt="..."> and have the browser render something that
| matches the description, similar to the "cookbook" analogy used
| in the Wikipedia article.
| Lorin wrote:
| That's an interesting concept, although it would generate a
| ton of bogomips since each client has to generate the image
| themselves instead of one time on the server.
|
| You'd also want "seed" and "engine" attributes to ensure all
| visitors see the same result.
| LeoPanthera wrote:
| You could at least push the work closer to the edge, by
| having genAI servers on each LAN, and in each ISP, similar
| to the idea of a caching web proxy before HTTPS rendered
| them impossible.
| lucianbr wrote:
| Push the work closer to the edge, _and multiply_ it by
| quite a lot. Generate each image many times. Why would we
| want this? Seems like the opposite of caching in a sense.
| sroussey wrote:
| If you are reading a web page on mars and bandwidth is
| more precious than processing power, then <img
| prompt="..."> might make sense.
|
| Not so much for us on earth however.
| roywiggins wrote:
| This sort of thing (but applied to video) is a plot point
| in _A Fire Upon The Deep_. Vinge 's term for the result
| is an "evocation."
| bobbylarrybobby wrote:
| All compression is, in a sense, the opposite of caching.
| You have to do more work to get the data, but you save
| space.
| onion2k wrote:
| Unless you don't actually care if everyone sees the same
| results. So long as the generated image is approximately
| what you prompted for, and the content of the image is
| decorative so it doesn't really need to be a specific,
| accurate representation of something, it's fine to display
| a different picture for every user.
|
| One of the best uses of responsive design I've ever seen
| was a site that looked _completely_ different at different
| breakpoints - different theme, font, images, and content.
| It 's was beautiful, and creative, and fun. Lots of users
| saw different things and had no idea other versions were
| there.
| semolino wrote:
| What site are you referring to?
| 7bit wrote:
| That would require a lot of GBs in libraries in the browser
| and a lot of processing power on the client CPU to render an
| image that is so unimportant that it doesn't really matter if
| it shows exactly what the author intended. To summarize that
| in three words: a useless technology.
|
| That idea is something that is only cool in theory.
| lxgr wrote:
| At least for LLMs, something very similar is already
| happening: https://huggingface.co/blog/Xenova/run-gemini-
| nano-in-your-b...
|
| Currently, we're definitely not there in terms of
| space/time tradeoffs for images, but I could imagine at
| least parameterized ML-based upscaling (i.e. ship a low-
| resolution image and possibly a textual description, have a
| local model upscale it to display resolution) at some
| point.
| ec109685 wrote:
| Similar to what Samsung does if you take a picture of the
| moon.
| everforward wrote:
| Similar ideas have floated around for a while. I've always
| enjoyed the elegance of compressing things down to a start
| and end index of digits of Pi.
|
| It's utterly impractical, but fun to muse about how neat it
| would be if it weren't.
| LtWorf wrote:
| AFAIK it's not been proved that every combination does
| exist in p.
|
| By comparison, you could easily define a number that goes
| 0,123456789101112131415... and use indexes to that number.
| However the index would probably be larger than what you're
| trying to encode.
| everforward wrote:
| Huh, I presumed that any non-repeating irrational number
| would include all number sequences but I think you're
| right. Even a sequence of 1 and 0 could be non-
| repetitive.
|
| I am curious what the compression ratios would be. I
| suspect the opposite, but the numbers are at a scale
| where my mind falters so I wouldn't say that with any
| confidence. Just 64 bits can get you roughly 10^20 digits
| into the number, and the "reach" grows exponentially with
| bits. I would expect that the smaller the file, the more
| common its sequence is.
| Groxx wrote:
| Time Lords probably, saving us from the inevitable end of this
| technology path, where all data in the universe is compressed
| into one bit which leads to an information-theoretic black hole
| that destroys everything.
| magicalhippo wrote:
| Reminds me of when I was like 13 and learned about CRC codes
| for the first time. Infinite compression here we come! Just
| calculate the 32bit CRC code for say 64 bits, transmit the CRC,
| then on the other hand just loop over all possible 64 bit
| numbers until you got the same CRC. So brilliant! Why wasn't
| this already used?!
|
| Of course, the downsides became apparent once the euphoria had
| faded.
| _factor wrote:
| Even better, take a physical object and slice it precisely in
| a ratio that contains your data in the fraction!
| lxgr wrote:
| Very relatable! "If this MD5 hash uniquely identifies that
| entire movie, why would anyone need to ever send the
| actual... Oh, I get it."
|
| Arguably the cruelest implication of the pigeonhole
| principle.
| Spivak wrote:
| The real version of this is Nvidia's web conferencing demo
| where they make a 3d model of your face and then only transfer
| the wireframe movements which is super clever.
|
| https://m.youtube.com/watch?v=TQy3EU8BCmo
|
| You can really feel the "compute has massively outpaced
| networking speed" where this kind of thing is actually
| practical. Maybe I'll see 10G residential in my lifetime.
| kstrauser wrote:
| This messages comes to you via a 10Gbps, $50 residential
| fiber connection.
|
| The future is already here - it's just not very evenly
| distributed.
| lxgr wrote:
| > The Sloot Digital Coding System is an alleged data sharing
| technique that its inventor claimed could store a complete
| digital movie file in 8 kilobytes of data
|
| 8 kilobytes? Rookie numbers. I'll do it in 256 bytes, as long
| as you're fine with a somewhat limited selection of available
| digital movie files ;)
| MBCook wrote:
| I can shrink any file down to just 32 bits using my unique
| method.
|
| I call it the High Amplitude Shrinkage Heuristic, or H.A.S.H.
|
| It is also reversible, but only safely to the last encoded
| file due to quantum hyperspace entanglement of ionic bonds.
| H.A.S.H.ing a different file will disrupt them preventing
| recovery of the original data.
| niceguy4 wrote:
| Not to side track the conversation but to side track the
| conversation, has there been many other major WebP exploits like
| the serious one in the past?
| BugsJustFindMe wrote:
| > _the longest post on my site, takes 92 KiB instead of 37 KiB.
| This amounts to an unnecessary 2.5x increase in load time_
|
| Sure, if you ignore latency. In reality it's an unnecessary
| 0.001% increase in load time because that size increase isn't
| enough to matter vs the round trip time. And the time you save
| transmitting 55 fewer KiB is probably less than the time lost to
| decompression. :p
|
| While fun, I would expect this specific scenario to actually be
| worse for the user experience not better. Speed will be a
| complete wash and compatibility will be worse.
| jsnell wrote:
| That size difference is large enough to make a difference in
| the number of round trips required (should be roughly one fewer
| roundtrip with any sensible modern value for the initial
| congestion window).
|
| Won't be a 2.5x difference, but also not 0.001%.
| BugsJustFindMe wrote:
| You don't need a new roundtrip for every packet. That would
| be devastating for throughput. One vs two vs three file
| packets get acked as a batch either way, not serially.
|
| Also when you get to the end, you then see
|
| > _The actual savings here are moderate: the original is 88
| KiB with gzip, and the WebP one is 83 KiB with gzip. In
| contrast, Brotli would provide 69 KiB._
|
| At 69 KiB you're still over the default TCP packet max, which
| means both cases transmit the same number of packets, one
| just has a bunch of extra overhead added for the extra
| JavaScript fetch, load, and execute.
|
| The time saved here is going to be negligible at best anyway,
| but there looks to be actually _negative_ because we 're
| burning time without reducing the number of needed packets at
| all.
| jsnell wrote:
| Those numbers are for a different page. For the original
| page, the article quotes 44 kB with this method vs. 92 kB
| for gzip.
|
| > At 69 KiB you're still over the default TCP packet max,
| which means both cases transmit the same number of packets,
|
| What? No, they absolutely don't transmit the same number of
| packets. Did you mean some other word?
| codetrotter wrote:
| They were probably thinking of the max size for packets
| in TCP, which is 64K (65535 bytes).
|
| However, Ethernet has a MTU (Maximum Transmission Unit)
| of 1500 bytes. Unless jumbo frames are used.
|
| And so I agree with you, the number of packets that will
| be sent for 69 KiB vs 92 KiB will likely be different.
| mananaysiempre wrote:
| I expect what GP meant is the default TCP _window_ size,
| so in a situation where bandwidth costs are dwarfed by
| roundtrip costs, these two cases will end up taking
| essentially the same time, because they will incur the
| same number of ACK roundtrips. Don't know if the numbers
| work out, but they at least sound plausible.
| BugsJustFindMe wrote:
| Yes, sorry
| jsnell wrote:
| No, there is no way the numbers would work out to the
| same number of roundtrips. The sizes are different by a
| factor of 2.5x, and the congestion window will only
| double in a single roundtrip. The only way the number of
| roundtrips would be the same is if both transfers fit in
| the initial congestion window.
| pierrec wrote:
| Interesting, how does that add a round trip? For the record
| here's what I believe to be the common definition of an
| additional "round trip", in a web development context:
| - client requests X - client gets X, which contains a
| reference to Y - therefore client requests Y
|
| So you're starting a new request that depends on the client
| having received the first one. (although upon closer
| inspection I think the technique described in the blog post
| manages to fit everything into the first response, so I'm not
| sure how relevant this is)
| jsnell wrote:
| Unless a resource is very small, it won't be transmitted in
| a single atomic unit. The sender will only send a part of
| it, wait the client to acknowledge having received them,
| and only then send more. That requires a network roundtrip.
| The larger the resource, the more network roundtrips will
| be required.
|
| If you want to learn more, pretty much any resource on TCP
| should explain this stuff. Here's something I wrote years
| ago, the background section should be pretty applicable:
| https://www.snellman.net/blog/archive/2017-08-19-slow-
| ps4-do...
| crote wrote:
| In reality it's more like: - client
| requests X - server sends bytes 0-2k of X -
| client acknowledges bytes 0-2k of X - server sends
| bytes 2k-6k of X - client acknowledges bytes 2k-6k of
| X - server sends bytes 6k-14k of X - client
| acknowledges bytes 6k-14k of X - server sends bytes
| 14k-30k of X - client acknowledges bytes 14k-30k of X
| - server sends bytes 30k-62k of X - client
| acknowledges bytes 30k-62k of X - server sends bytes
| 62k-83k of X - client acknowledges bytes 62k-83k of X
| - client has received X, which contains a reference to Y
| - therefore client requests Y
|
| It's all about TCP congestion control here. There are
| dozens of algorithms used to handle it, but in pretty much
| all cases you want to have _some_ kind of slow buildup in
| order to avoid completely swamping a slower connection and
| having all but the first few of your packets getting
| dropped.
| notpushkin wrote:
| > client acknowledges bytes 0-2k of X
|
| Doesn't client see reference to Y at this point? Modern
| browsers start parsing HTML even before they receive the
| whole document.
| Timwi wrote:
| Not just modern. This was even more significant on slow
| connections, so they've kind of always done that. One
| could even argue that HTML, HTTP (specifically, chunked
| encoding) and gzip are all intentionally designed to
| enable this.
| Retr0id wrote:
| Why is there more latency?
|
| Edit: Ah, I see OP's code requests the webp separately. You can
| avoid the extra request if you write a self-extracting
| html/webp polyglot file, as is typically done in the demoscene.
| BugsJustFindMe wrote:
| It takes more time for your message to get back and forth
| between your computer and the server than it takes for the
| server to pump out some extra bits.
|
| Even if you transmit the js stuff inline, the op's notion of
| time still just ignores the fact that it takes the caller
| time to even ask the server for the data in the first place,
| and at such small sizes that time swallows the time to
| transmit from the user's perspective.
| Retr0id wrote:
| Here's a demo that only uses a single request for the whole
| page load: https://retr0.id/stuff/bee_movie.webp.html
|
| It is technically 2 requests, but the second one is a cache
| hit, in my testing.
| BugsJustFindMe wrote:
| That's fine, but if you're evaluating the amount of time
| it takes to load a webpage, you cannot ignore the time it
| takes for the client request to reach your server in the
| first place or for the client to then unpack the data.
| The time saved transmitting such a small number of bits
| will be a fraction of the time spent making that initial
| request anyway. That's all I'm saying.
|
| OP is only looking at transmit size differences, which is
| both not the same as transmit time differences and also
| not what the user actually experiences when requesting
| the page.
| purplesyringa wrote:
| Hmm? I'm not sure where you're taking that from. The webp is
| inlined.
| Retr0id wrote:
| Ah, so it is! I was skim-reading and stopped at `const
| result = await fetch("compressor/compressed.webp");`
| jgalt212 wrote:
| I have similar feelings on js minification especially if you're
| sending via gzip.
| edflsafoiewq wrote:
| Well, that, and there's an 850K
| Symbols-2048-em%20Nerd%20Font%20Complete.woff2 file that sort
| of drowns out the difference, at least if it's not in cache.
| lxgr wrote:
| Now I got curious, and there's also a 400 kB CSS file to go
| with it: https://purplesyringa.moe/fonts/webfont.css
|
| I'm not up to date on web/font development - does anybody
| know what that does?
| bobbylarrybobby wrote:
| It adds unicode characters before elements with the given
| class. Then it's up to the font to display those Unicode
| characters -- in this case, based on the class names, one
| can infer that the font assigns an icon to each character
| used.
| lxgr wrote:
| That makes sense, thank you!
|
| So the purpose is effectively to have human-readable CSS
| class names to refer to given glyphs in the font, rather
| than having stray private use Unicode characters in the
| HTML?
| lobsterthief wrote:
| Yep
|
| This is a reasonable approach if you have a large number
| of icons across large parts of the site, but you should
| always compile the CSS/icon set down to only those used.
|
| If only a few icons, and the icons are small, then
| inlining the SVG is a better option. But if you have too
| many SVGs directly embedded on the site, the page size
| itself will suffer.
|
| As always with website optimization, whether something is
| a good option always "depends".
| lifthrasiir wrote:
| I think at least some recent tools will produce ligatures
| to turn a plain text into an icon to avoid this issue.
| alpaca128 wrote:
| Another detail is that this feature breaks and makes some
| sites nearly unusable if the browser is set to ignore a
| website's custom fonts.
| yencabulator wrote:
| More reasonable than this class+CSS would be e.g. a
| React/static-website-template/etc custom element that
| outputs the correct glyph. The output doesn't need to
| contain this indirection, and all of the possibilities.
| marcellus23 wrote:
| Wow, yeah. That kind of discredits the blog author a bit.
| edflsafoiewq wrote:
| I mean, it's all just for fun of course.
| purplesyringa wrote:
| It's cached. Not ideal, sure, and I'll get rid of that
| bloat someday, but that file is not mission-critical and
| the cost is amortized between visits. I admit my fault
| though.
| marcellus23 wrote:
| Definitely fair and I was a bit harsh. It just seemed a
| bit nonsensical to go through such efforts to get rid of
| a few kilobytes while serving a massive font file. But I
| also understand that it's mostly for fun :)
| est wrote:
| seriously, why can't modern browsers turn off features like
| remote fonts, webrtc, etc. in settings. I hate when reading a
| bit then the font changes. Not to say fingerprinting risks.
| BugsJustFindMe wrote:
| You can, and then when someone uses an icon font instead of
| graphics their page breaks.
| Dalewyn wrote:
| Skill issue.
|
| Pictures are pictures, text is text. <img> tag exists for
| a reason.
| lenkite wrote:
| Its a convenience packaging issue. An icon font is simply
| more convenient to handle. <img> tags for a hundred
| images requires more work.
|
| Icon fonts are used all over the place - look at the
| terminal nowadays. Most TUI's require an icon font to be
| installed.
| Dalewyn wrote:
| ><img> tags for a hundred images requires more work.
|
| So it's a skill issue.
| est wrote:
| I believe many Web APIs can be enabled/disabled on a
| website basis, no?
| collinmanderson wrote:
| iPhone lock down mode turns off remote fonts (including
| breaking font icons as sibling says)
| lxgr wrote:
| That's certainly reasonable if you optimize only for loading
| time (and make certain assumptions about everybody's available
| data rate), but sometimes I really wish website (and more
| commonly app) authors wouldn't make that speed/data tradeoff so
| freely on my behalf, for me to find out _after_ they 've
| already pushed that extra data over my metered connection.
|
| The tragedy here is that while some people, such as the author
| of TFA, go to great lengths to get from about 100 to 50 kB,
| others don't think twice to send me literally tens of megabytes
| of images, when I just want to know when a restaurant is open -
| on roaming data.
|
| Resource awareness exists, but it's unfortunately very unevenly
| distributed.
| k__ wrote:
| We need a MoSh-based browser with gopher support.
| nox101 wrote:
| and you'd need AI to tell you whats in the pictures because
| lots of restaurant sites just have photos of their menu and
| some designer with no web knowledge put their phone number,
| address, and hours in an image designed in Photoshop
| tacone wrote:
| You're in for bad luck. Some time ago I tried some photos
| of pasta dishes with Gemini and it could not guess the
| recipe name.
| mintplant wrote:
| I've found Gemini to be pretty terrible at vision tasks
| compared to the competition. Try GPT-4o or
| Claude-3.5-Sonnet instead.
| zamadatix wrote:
| There is an interesting "Save-Data" header to let a site know
| which makes sense to optimize for on connection but it seems
| to be Chrome only so far https://caniuse.com/?search=save-
| data
|
| I wish there was a bit of an opposite option - a "don't
| lazy/partially load anything" for those of us on fiber
| watching images pop up as we scroll past them in the page
| that's been open for a minute.
| kelnos wrote:
| > _... on roaming data._
|
| A little OT, and I'm not sure if iOS has this ability, but I
| found that while I'm traveling, if I enable Data Saver on my
| (Android) phone, I can easily go a couple weeks using under
| 500MB of cellular data. (I also hop onto public wifi whenever
| it's available, so being in a place with lots of that is
| helpful.)
|
| My partner, who has an iPhone, and couldn't find an option
| like that (maybe it exists; I don't think she tried very hard
| to find it), blew through her 5GB of free high-speed roaming
| data (T-Mobile; after that you get 256kbps, essentially
| unusable) in 5 or 6 days on that same trip.
|
| It turns out there's _so_ much crap going on the background,
| and it 's all so unnecessary for the general user experience.
| And I bet it saves battery too. Anything that uses Google's
| push notifications system still works fine and gets timely
| notifications, as IIRC that connection is exempt from the
| data-saving feature.
|
| I've thought about leaving Data Saver on all the time, even
| when on my home cellular network. Should probably try it and
| see how it goes.
|
| But overall, yes, it would be great if website designers
| didn't design as if everyone is on an unmetered gigabit link
| with 5ms latency...
| YourOrdinaryCat wrote:
| Settings > Cellular > Cellular Data Options > Low Data Mode
| is the iOS equivalent.
| sleepydog wrote:
| Seriously, if you're saving less than a TCP receive window's
| worth of space it's not going to make any difference to
| latency.
|
| I suppose it could make a difference on lossy networks, but I'm
| not sure.
| lelandfe wrote:
| If the blob contains requests (images, but also stylesheets,
| JS, or worst case fonts), it will actually instead be a net
| negative to latency. The browser's preload scanner begins
| fetching resources even before the HTML is finished being
| parsed. That can't happen if the HTML doesn't exist until
| after JS decodes it. In other words, the entire body has
| become a blocking resource.
|
| These are similar conversations people have around hydration,
| by the by.
| rrr_oh_man wrote:
| _> These are similar conversations people have around
| hydration_
|
| For the uninitiated:
| https://en.m.wikipedia.org/wiki/Hydration_(web_development)
| fsndz wrote:
| exactly what I thought too
| zahlman wrote:
| Actually, I was rather wondering about that claim, because it
| seems accidentally cherry-picked. Regarding that post:
|
| > This code minifies to about 550 bytes. Together with the WebP
| itself, this amounts to 44 KiB. In comparison, gzip was 92 KiB,
| and Brotli would be 37 KiB.
|
| But regarding the current one:
|
| > The actual savings here are moderate: the original is 88 KiB
| with gzip, and the WebP one is 83 KiB with gzip. In contrast,
| Brotli would provide 69 KiB. Better than nothing, though.
|
| Most of the other examples don't show dramatic (like more than
| factor-of-2) differences between the compression methods
| either. In my own local testing (on Python wheel data, which
| should be mostly Python source code, thus text that's full of
| common identifiers and keywords) I find that XZ typically
| outperforms gzip by about 25%, while Brotli doesn't do any
| better than XZ.
| lifthrasiir wrote:
| XZ was never considered to become a compression algorithm
| built into web browsers to start with. Brotli decoder is
| already there for HTTP, so it has been proposed to include
| the full Brotli decoder and decoder API as it shouldn't take
| too much effort to add an encoder and expose them.
|
| Also, XZ (or LZMA/LZMA2 in general) produces a smaller
| compressed data than Brotli with lots of free time, but is
| much slower than Brotli when targetting the same compression
| ratio. This is because LZMA/LZMA2 uses an adaptive range
| coder and multiple code distribution contexts, both highly
| contribute to the slowness when higher compression ratios are
| requested. Brotli only has the latter and its coding is just
| a bitwise Huffman coder.
| oefrha wrote:
| It's not just decompression time. They need to download the
| whole thing before decompression, whereas the browser can
| decompress and render HTML as it's streamed from the server. If
| the connection is interrupted you lose everything, instead of
| being able to read the part you've downloaded.
|
| So, for any reasonable connection the difference doesn't
| matter; for actually gruesomely slow/unreliable connections
| where 50KB matters this is markedly worse. While a fun
| experiment, please don't do it on your site.
| robocat wrote:
| Other major issues that I had to contend with:
|
| 1: browsers choose when to download files and run JavaScript.
| It is not as easy as one might think to force JavaScript to
| run immediately as high priority (which it needs to be when
| it is on critical path to painting).
|
| 2: you lose certain browser optimisations where normally many
| things are done in parallel. Instead you are introducing
| delays into critical path and those delays might not be worth
| the "gain".
|
| 3: Browsers do great things to start requesting files in
| parallel as files are detected with HTML/CSS. Removing that
| feature can be a poor tradeoff.
|
| There are a few other unobvious downsides. I would never
| deploy anything like that to a production site without
| serious engineering effort to measure the costs and benefits.
| hot_gril wrote:
| I wish ISPs would advertise latency instead of just bandwidth.
| It matters a lot for average users, especially now that
| bandwidth is generally plentiful.
| 98469056 wrote:
| While peeking at the source, I noticed that the doctype
| declaration is missing a space. It currently reads
| <!doctypehtml>, but it should be <!doctype html>
| BugsJustFindMe wrote:
| Maybe their javascript adds it back in :)
| palsecam wrote:
| `<!doctype html>` can be minified into `<!doctypehtml>`.
|
| It's, strictly speaking, invalid HTML, but it still
| successfully triggers standards mode.
|
| See https://GitHub.com/kangax/html-minifier/pull/970 /
| https://HTML.spec.WHATWG.org/multipage/parsing.html#parse-er...
|
| (I too use that trick on https://FreeSolitaire.win)
| saagarjha wrote:
| Why would you do this?
| palsecam wrote:
| To have FreeSolitaire.win homepage be only _20.7 kB_ over
| the wire.
|
| That's for _the whole game_ : graphics are inline SVGs, JS
| & CSS are embedded in <script> and <style> elements.
| KTibow wrote:
| Sounds like this is from a minifier that removes as much as it
| can [0]
|
| 0:
| https://github.com/KTibow/KTibow/issues/3#issuecomment-23367...
| gkbrk wrote:
| > Why readPixels is not subject to anti-fingerprinting is beyond
| me. It does not sprinkle hardly visible typos all over the page,
| so that works for me.
|
| > keep the styling and the top of the page (about 8 KiB
| uncompressed) in the gzipped HTML and only compress the content
| below the viewport with WebP
|
| Ah, that explains why the article suddenly cut off after a random
| sentence, with an empty page that follows. I'm using LibreWolf
| which disables WebGL, and I use Chromium for random web games
| that need WebGL. The article worked just fine with WebGL enabled,
| neat technique to be honest.
| niutech wrote:
| It isn't neat as long as it doesn't work with all modern web
| browsers (even with fingerprinting protection) and doesn't have
| a fallback for older browsers. WWW should be universally
| accessible and progressively enhanced, starting with plain
| HTML.
| afavour wrote:
| It isn't a serious proposal. It's a creative hack that no
| one, author included is suggesting should be used in
| production.
| kjhcvkek77 wrote:
| This philosophy hands your content on a silver platter to ai
| companies, so they can rake in money while giving nothing
| back to the author.
| latexr wrote:
| I don't support LLM companies stealing content and
| profiting from it without contributing back. But if you're
| going to fight that by making things more difficult for
| humans, especially those with accessibility needs, then
| what even is the point of publishing anything?
| Retr0id wrote:
| I've used this trick before! Oddly enough I can't remember _what_
| I used it for (perhaps just to see if I could), and I commented
| on it here:
| https://gist.github.com/gasman/2560551?permalink_comment_id=...
|
| Edit: I found my prototype from way back, I guess I was just
| testing heh: https://retr0.id/stuff/bee_movie.webp.html
| lucb1e wrote:
| That page breaks my mouse gestures add-on! (Or, I guess we
| don't have add-ons anymore but rather something like script
| injections that we call extensions, yay...) Interesting
| approach to first deliver 'garbage' and then append a bit of JS
| to transform it back into a page. The inner security nerd in me
| wonders if this might open up attacks if you would have some
| kind of user-supplied data, such as a comment form. One could
| probably find a sequence of bytes to comment that will, after
| compression, turn into a script tag, positioned (running)
| before yours would?
| Retr0id wrote:
| Yeah that's plausible, you definitely don't want any kind of
| untrusted data in the input.
|
| Something I wanted to do but clearly never got around to, was
| figuring out how to put an open-comment sequence (<!--) in a
| header somewhere, so that most of the garbage gets commented
| out
| lifthrasiir wrote:
| In my experience WebP didn't work well for the general case
| where this technique is actually useful (i.e. data less than 10
| KB), because the most additions of WebP lossless to PNG are
| about modelling and not encoding, while this textual
| compression would only use the encoding part of WebP.
| raggi wrote:
| Chromies got in the way of it for a very long time, but zstd is
| now coming to the web too, as it's finally landed in chrome - now
| we've gotta get safari onboard
| CharlesW wrote:
| Looks like it's on the To Do list, at least:
| https://webkit.org/standards-positions/#position-168
| mananaysiempre wrote:
| I'd love to go all-Zstandard, but in this particular case, as
| far as I know, Brotli and Zstandard are basically on par at
| identical values of decompressor memory consumption.
| simondotau wrote:
| Realistically, everything should just support everything.
| There's no reason why every (full featured) web server and
| every (full featured) web browser couldn't support all
| compelling data compression algorithms.
|
| Unfortunately we live in a world where Google decides to rip
| JPEG-XL support out of Chrome for seemingly no reason other
| than spite. If the reason was a lack of maturity in the
| underlying library, fine, but that wasn't the reason they
| offered.
| madeofpalk wrote:
| > There's no reason
|
| Of course, there is - and it's really boring.
| Prioritisation, and maintenance.
|
| It's a big pain to add, say, 100 compressions formats and
| support them indefinitely, especially with little
| differentiation between them. Once we agree on what the
| upper bound of useless formats is, we can start to
| negotiate what the lower limit is.
| raggi wrote:
| It's not prioritization of the code - it's relatively
| little implementation and low maintenance but security is
| critical here and everyone is very afraid of compressors
| because gzip, jpeg and various others were pretty bad.
| Zstd, unlike lz4 before it (at least early on) has a good
| amount of tests and fuzzing. The implementation could
| probably be safer (with fairly substantial effort) but
| having test and fuzzing coverage is a huge step forward
| over prior industry norms
| simondotau wrote:
| I qualified it with _compelling_ which means only
| including formats /encodings which have demonstrably
| superior performance in some non-trivial respect.
|
| And I qualified it with _mature implementation_ because I
| agree that if there is no implementation which has a
| clear specification, is well written, actively
| maintained, and free of jank, then it ought not qualify.
|
| Relative to the current status quo, I would only imagine
| the number of data compression, image compression, and
| media compression options to increase by a handful.
| Single digits. But the sooner we add them, the sooner
| they can become sufficiently widely deployed as to be
| useful.
| rafaelmn wrote:
| How many CVEs came out of different file format handling
| across platforms ? Including shit in browsers has insane
| impact.
| raggi wrote:
| Brotli and zstd are close, each trading in various
| configurations and cases - but zstd is muuuuuch more usable
| in a practical sense, because the code base can be easily
| compiled and linked against in a wider number of places with
| less effort, and the cli tools turn up all over for similar
| reasons. Brotli like many Google libraries in the last decade
| are infected with googlisms. Brotli is less bad, in a way,
| for example the C comes with zero build system at all, which
| is marginally better than a bazel disaster of generated
| stuff, but it's still not in a state the average distro
| contributor gives a care to go turn into a library, prepare
| pkgconf rules for and whatnot - plus no versioning and so on.
| Oh and the tests are currently failing.
| csjh wrote:
| I think the most surprising part here is the gzipped-
| base64'd-compressed data almost entirely removes the base64
| overhead.
| zamadatix wrote:
| It feels somewhat intuitive since (as the article notes) the
| Huffman encoding stage effectively "reverses" the original
| base64 overhead issue that an 8 bit (256 choices) index is used
| for 6 bits (64 choices) of actual characters. A useful
| compression algorithm which _didn't_ do this sort of thing
| would be very surprising as it would mean it doesn't notice
| simple patterns in the data but somehow compresses things
| anyways.
| ranger_danger wrote:
| how does it affect error correction though?
| zamadatix wrote:
| Neither base64 nor the gzipped version have error
| correction as implemented. The extra overhead bits in
| base64 come from selecting only a subset of printable
| characters, not by adding redundancy to the useful bits.
| galaxyLogic wrote:
| Is there a tool or some other way to easily encode a JPG image so
| it can be embedded into HTML? I know there is something like
| that, but is it easy? Could it be made easier?
| throwanem wrote:
| You can convert it to base64 and inline it anywhere an image
| URL is accepted, eg <img
| src="data:image/jpeg;base64,abc123..." />
|
| (Double-check the exact syntax and the MIME type before you use
| it; it's been a few years since I have, and this example is
| from perhaps imperfect memory.)
| TacticalCoder wrote:
| I _loved_ that (encoding stuff in _webp_ ) but my takeaway from
| the figures in the article is this: brotli is so good I'll host
| from somewhere where I can serve brotli (when and if the client
| supports brotli ofc).
| niutech wrote:
| This page is broken at least on Sailfish OS browser, there is a
| long empty space after the paragraph:
|
| > Alright, so we're dealing with 92 KiB for gzip vs 37 + 71 KiB
| for Brotli. Umm...
|
| That said, the overhead of gzip vs brotli HTML compression is
| nothing compared with amount of JS/images/video current websites
| use.
| mediumsmart wrote:
| same on orion and safari and librewolf - is this a chrome page?
| simmonmt wrote:
| A different comment says librewolf disables webgl by default,
| breaking OP's decompression. Is that what you're seeing?
| zamadatix wrote:
| Works fine on Safari and Firefox for me.
| txtsd wrote:
| Same on Mull
| Dibby053 wrote:
| I didn't know canvas anti-fingerprinting was so rudimentary. I
| don't think it increases uniqueness (the noise is different every
| run) but bypassing it seems trivial: run the thing n times and
| take the mode. With so little noise, 4 or 5 times should be more
| than enough.
| hedora wrote:
| The article says it's predictable within a given client, so
| your trick wouldn't work.
|
| So, just use the anti-fingerprint noise as a cookie, I guess?
| Dibby053 wrote:
| Huh, it seems it's just my browser that resets the noise
| every run.
|
| I opened the page in Firefox like the article suggests and I
| get a different pattern per site and session. That prevents
| using the noise as a supercookie, I think, if its pattern
| changes every time cookies are deleted.
| sunaookami wrote:
| It is different per site and per session, yes.
| toddmorey wrote:
| "I hope you see where I'm going with this and are yelling 'Oh why
| the fuck' right now."
|
| I love reading blogpost like these.
| butz wrote:
| Dropping google fonts should improve page load time a bit too,
| considering those are loaded from remote server that requires
| additional handshake.
| kevindamm wrote:
| ..but if enough other sites are also using that font then it
| may already be available locally.
| pornel wrote:
| This has stopped working many years ago. Every top-level
| domain now has its own private cache _of all other domains_.
|
| You likely have dozens of copies of Google Fonts, each in a
| separate silo, with absolutely zero reuse between websites.
|
| This is because a global cache use to work like a cookie, and
| has been used for tracking.
| kevindamm wrote:
| ah, I had forgotten about that, you're right.
|
| well at least you don't have to download it more than once
| for the site, but first impressions matter yeah
| madeofpalk wrote:
| Where "many years ago" is... 11 years ago for Safari!
| https://bugs.webkit.org/show_bug.cgi?id=110269
| SushiHippie wrote:
| No, because of cache partitioning, which has been a thing for
| a while.
|
| https://developer.chrome.com/blog/http-cache-partitioning
| butz wrote:
| I wonder what is the difference in CPU usage on client side for
| WebP variant vs standard HTML? Are you causing more battery drain
| on visitor devices?
| lucb1e wrote:
| It depends. Quite often (this is how you can tell I live in
| Germany) mobile data switches to "you're at the EDGE of the
| network range" mode1 and transferring a few KB means keeping
| the screen on and radio active for a couple of minutes. If the
| page is now 45KB instead of 95KB, that's a significant
| reduction in battery drain!
|
| _Under normal circumstances_ you 're probably very right
|
| 1 Now I wonder if the makers foresaw how their protocol name
| might sound to us now
| bawolff wrote:
| They did all this and didn't even measure time to first paint?
|
| What is the point of doing this sort of thing if you dont even
| test how much faster or slower it made the page to load?
| lxgr wrote:
| From the linked Github issue giving the rationale why Brotli is
| not available in the CompressionStream API:
|
| > As far as I know, browsers are only shipping the decompression
| dictionary. Brotli has a separate dictionary needed for
| compression, which would significantly increase the size of the
| browser.
|
| How can the decompression dictionary be smaller than the
| compression one? Does the latter contain something like a space-
| time tradeoff in the form of precalculated most efficient
| representations of given input substrings or something similar?
| zamadatix wrote:
| Perhaps I'm reading past some of the surrounding context but
| that doesn't actually say the problem is about the relative
| sizes, just that the compression dictionary isn't already in
| browsers while the decompression dictionary already is.
|
| It's a bit disappointing you can't use Brotli in the
| DecompressionStream() interface just because it may or may not
| be available in the CompressionStream() interface though.
| lxgr wrote:
| I'm actually not convinced that there are two different
| dictionaries at all. The Brotli RFC only talks about a static
| dictionary, not a separate encoding vs. a decoding
| dictionary.
|
| My suspicion is that this is a confusion of the (runtime)
| sliding window, which limits maximum required memory on the
| decoder's side to 16 MB, with the actual shared static
| dictionary (which needs to be present in the decoder only, as
| far as I can tell; the encoder can use it, and if it does, it
| would be the same one the decoder has as well).
| zamadatix wrote:
| IIRC from working with Brotli before it's not that the it's
| truly a "different dictionary" but rather more like a
| "reverse view" of what is ultimately the same dictionary
| mappings.
|
| On one hand it seems a bit silly to worry about ~100 KB in
| browser for what will probably, on average, save more than
| that in upload/download the first time it is used. On the
| other hand "it's just a few hundred KB" each release for a
| few hundred releases ends up being a lot of cruft you can't
| remove without breaking old stuff. On the third hand coming
| out of our head... it's not like Chrome has been against
| shipping more for functionality for features they'd like to
| impose on users even if users don't actually want them
| anyways so what are small ones users can actually benefit
| from against that.
| lifthrasiir wrote:
| I believe the compression dictionary refers to [1], which is
| used to quickly match dictionary-compressable byte sequences. I
| don't know where 170 KB comes from, but that hash alone does
| take 128 KiB and might be significant if it can't be easily
| recomputed. But I'm sure that it can be quickly computed on the
| loading time if the binary size is that important.
|
| [1]
| https://github.com/google/brotli/blob/master/c/enc/dictionar...
| lxgr wrote:
| I was wondering that too, but that dictionary itself
| compresses down to 29 KB (using regular old gzip), so it
| seems pretty lightweight to include even if it were
| hard/annoying to precompute at runtime or install time.
| lifthrasiir wrote:
| Once installed it will occupy 128 KiB of data though, so it
| might be still relevant for the Chromium team.
| gildas wrote:
| In the same vein, you can package HTML pages as self-extracting
| ZIP files with SingleFile [1]. You can even include a PNG image
| to produce files compatible with HTML, ZIP and PNG [2], and for
| example display the PNG image in the HTML page [3].
|
| [1] https://github.com/gildas-lormeau/SingleFile?tab=readme-
| ov-f...
|
| [2] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG
|
| [3] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-
| PNG/raw/...
| astrostl wrote:
| Things I seek in an image format:
|
| (1) compatibility
|
| (2) features
|
| WebP still seems far behind on (1) to me so I don't care about
| the rest. I hope it gets there, though, because folks like this
| seem pretty enthusiastic about (2).
| mistrial9 wrote:
| agree - webp has a lib on Linux but somehow, standard image
| viewers just do not read it, so -> "FAIL"
| sgbeal wrote:
| > webp has a lib on Linux but somehow, standard image viewers
| just do not read it
|
| That may apply to old "LTS" Linuxes, but not any relatively
| recent one. Xviewer and gimp immediately come to mind as
| supporting it and i haven't had a graphics viewer on Linux
| _not_ be able to view webp in at least 3 or 4 years.
| jfoster wrote:
| The compatibility gap on WebP is already quite small. Every
| significant web browser now supports it. Almost all image tools
| & viewers do as well.
|
| Lossy WebP comes out a lot smaller than JPEG. It's definitely
| worth taking the saving.
| somishere wrote:
| Lots of nice tricks in here, definitely fun! Only minor nitpick
| is that it departs fairly rapidly from the lede ... which
| espouses the dual virtues of an accessible and js-optional
| reading experience ;)
| bogzz wrote:
| Still waiting on a webp encoder to be added to the Go stdlib...
| kopirgan wrote:
| 19 year old and look at the list of stuff she's done! Perhaps
| started coding in the womb?! Amazing.
| Pesthuf wrote:
| It's impressive how close this is to Brotli even though brotli
| has this massive pre-shared dictionary. Is the actual compression
| algorithm used by it just worse, or does the dictionary just
| matter much less than I think?
| lifthrasiir wrote:
| Pre-shared dictionary is most effective for the small size that
| can't reach the stationary distribution required for typical
| compressors. I don't know the exact threshold, but my best
| guess is around 1--10 KB.
| ajsnigrutin wrote:
| So... 60kB less of transfer... and how much slower is it on an
| eg. galaxy s8 phone my mom has, because of all the shenanigans
| done to save those 60kB?
| ranger_danger wrote:
| >ensure it works without JavaScript enabled
|
| >manually decompress it in JavaScript
|
| >Brotli decompressor in WASM
|
| the irony seems lost
| purplesyringa wrote:
| A nojs version is compiled separately and is linked via a meta
| refresh. Not ideal, and I could add some safeguards for people
| without webgl, but I'm not an idiot.
| divbzero wrote:
| > _Typically, Brotli is better than gzip, and gzip is better than
| nothing. gzip is so cheap everyone enables it by default, but
| Brotli is way slower._
|
| Note that _way slower_ applies to speed of compression, not
| decompression. So Brotli is a good bet if you can precompress.
|
| > _Annoyingly, I host my blog on GitHub pages, which doesn't
| support Brotli._
|
| If your users all use modern browsers and you host static pages
| through a service like Cloudflare or CloudFront that supports
| custom HTTP headers, you can implement your own Brotli support by
| precompressing the static files with Brotli and adding a
| _Content-Encoding: br_ HTTP header. This is kind of cheating
| because you are ignoring proper content negotiation with _Accept-
| Encoding_ , but I've done it successfully for sites with targeted
| user bases.
| lifthrasiir wrote:
| It is actually possible to use Brotli directly in the web
| browser... with caveats of course. I believe my 2022 submission
| to JS1024 [1] is the first ever demonstration of this concept,
| and I also have a proof-of-concept code for the arbitrary
| compression (which sadly didn't work for the original size-coding
| purpose though). The main caveat is that you are effectively
| limited to the ASCII character, and that it is highly sensitive
| to the rendering stack for the obvious reason---it no longer
| seems to function in Firefox right now.
|
| [1] https://js1024.fun/demos/2022/18/readme
|
| [2]
| https://gist.github.com/lifthrasiir/1c7f9c5a421ad39c1af19a9c...
| purplesyringa wrote:
| This technique is amazing and way cooler than my post. Kudos!
| zamadatix wrote:
| The key note for understanding the approach (without diving
| into how to understand it's actually wrangled):
|
| > The only possibility is to use the WOFF2 font file format
| which Brotli was originally designed for, but you need to make
| a whole font file to leverage this. This got more complicated
| recently by the fact that modern browsers sanitize font files,
| typically by the OpenType Sanitizer (OTS), as it is very
| insecure to put untrusted font files directly to the system.
| Therefore we need to make an WOFF2 file that is sane enough to
| be accepted by OTS _and_ has a desired byte sequence inside
| which can be somehow extracted. After lots of failed
| experiments, I settled on the glyph widths ("advance") which
| get encoded in a sequence of two-byte signed integers with
| almost no other restrictions.
|
| Fantastic idea!
| lifthrasiir wrote:
| Correction: It still works on Firefox, I just forgot that its
| zoom factor should be exactly 100% on Firefox to function. :-)
| jfoster wrote:
| I work on Batch Compress (https://batchcompress.com/en) and
| recently added WebP support, then made it the default soon after.
|
| As far as I know, it was already making the smallest JPEGs out of
| any of the web compression tools, but WebP was coming out only
| ~50% of the size of the JPEGs. It was an easy decision to make
| WebP the default not too long after adding support for it.
|
| Quite a lot of people use the site, so I was anticipating some
| complaints after making WebP the default, but it's been about a
| month and so far there has been only one complaint/enquiry about
| WebP. It seems that almost all tools & browsers now support WebP.
| I've only encountered one website recently where uploading a WebP
| image wasn't handled correctly and blocked the next step. Almost
| everything supports it well these days.
| pornel wrote:
| Whenever WebP gives you file size savings bigger than 15%-20%
| compared to a JPEG, the savings are coming from quality
| degradation, not from improved compression. If you compress and
| optimize JPEG well, it shouldn't be far behind WebP.
|
| You can always reduce file size of a JPEG by making a WebP that
| looks _almost_ the same, but you can also do that by
| recompressing a JPEG to a JPEG that looks _almost_ the same.
| That 's just a property of all lossy codecs, and the fact that
| file size grows exponentially with quality, so people are
| always surprised how even tiny almost invisible quality
| degradation can change the file sizes substantially.
| Jamie9912 wrote:
| Why don't they make zstd images surely that would beat webp
| sgbeal wrote:
| > Why don't they make zstd images surely that would beat webp
|
| zstd is a general-purpose compressor. By and large (and i'm
| unaware of any exceptions), specialized/format-specific
| compression (like png, wepb, etc.) will compress better than a
| general-purpose compressor because format-specific compressors
| can take advantage of quirks of the format which a general-
| purpose solution cannot. Also, format-specific ones are often
| lossy (or conditionally so), enabling them to trade lower
| fidelity for better compression, something a general-purpose
| compressor cannot do.
| phh wrote:
| > A real-world web page compressed with WebP? Oh, how about the
| one you're reading right now? Unless you use an old browser or
| have JavaScript turned off, WebP compresses this page starting
| from the "Fool me twice" section. If you haven't noticed this,
| I'm happy the trick is working :-)
|
| Well it didn't work in Materialistic (I guess their webview
| disable js), and the failure mode is really not comfortable.
| cobbal wrote:
| I would love to try reading the lossy version.
| kardos wrote:
| On the fingerprinting noise: this sounds like a job for FEC [1].
| It would increase the size but allow using the Canvas API. I
| don't know if this would solve the flicker though (not a front
| end expert here)
|
| Also, it's a long shot, but could the combo of FEC (+size) and
| lossy compression (-size) be a net win?
|
| [1] https://en.m.wikipedia.org/wiki/Error_correction_code
| michaelbrave wrote:
| I personally don't much care for the format, if I save an image
| and it ends up WebP then I have to convert it before I can edit
| or use it in any meaningful way since it's not supported in
| anything other than web browsers. It's just giving me extra steps
| to have to do.
| ezfe wrote:
| Takes 2 seconds (literally, on macOS it's in the right click
| menu) to convert and it's smaller so not really a problem
| hot_gril wrote:
| Ironically, even Google's products like Slides don't support
| webp images. But if/when it gets more support, I guess it's
| fine. I can tolerate a new format once every 20 years.
|
| .webm can go away, though.
| rrrix1 wrote:
| I very much enjoyed reading this. Quite clever!
|
| But...
|
| > Annoyingly, I host my blog on GitHub pages, which doesn't
| support Brotli.
|
| Is the _glaringly_ obvious solution to this not as obvious as I
| think it is?
|
| TFA went through a lot of round-about work to get (some) Brotli
| compression. Very impressive Yak Shave!
|
| If you're married to the idea of a Git-based automatically
| published web site, you could _at least_ replicate your code and
| site to Gitlab Pages, which has supported precompressed Brotli
| since 2019. Or use one of Cloudflare 's free tier services.
| There's a variety of ways to solve this problem before the first
| byte is sent to the client.
|
| Far too much of the world's source code already depends
| exclusively on Github. I find it distasteful to also have the
| small web do the same while blindly accepting an inferior
| experience and worse technology.
| purplesyringa wrote:
| The solution's obvious, but if I followed it, I wouldn't have a
| fun topic to discuss or fake internet points to boast about,
| huh? :)
|
| I'll probably switch to Cloudflare Pages someday when I have
| time to do that.
| DaleCurtis wrote:
| What a fun excursion :) You can also use the ImageDecoder API:
| https://developer.mozilla.org/en-US/docs/Web/API/ImageDecode...
| and VideoFrame.copyTo: https://developer.mozilla.org/en-
| US/docs/Web/API/VideoFrame/... to skip canvas entirely.
| purplesyringa wrote:
| It's unfortunately Chromium-only for now, and I wanted to keep
| code simple. I've got a PoC lying around with VideoFrame and
| whatnot, but I thought this would be better for a post.
___________________________________________________________________
(page generated 2024-09-08 23:00 UTC)