[HN Gopher] My failed attempt to shrink all NPM packages by 5%
___________________________________________________________________
My failed attempt to shrink all NPM packages by 5%
Author : todsacerdoti
Score : 272 points
Date : 2025-01-27 12:44 UTC (10 hours ago)
(HTM) web link (evanhahn.com)
(TXT) w3m dump (evanhahn.com)
| huqedato wrote:
| Try to use this https://github.com/xthezealot/npmprune
| phdelightful wrote:
| My reading of OP is that it's less about whether zopfli is
| technically the best way to achieve a 5% reduction in package
| size, and more about how that relatively simple proposal
| interacted with the NPM committee. Do you think something like
| this would fare better or differently for some reason?
| sd9 wrote:
| The final pro/cons list:
| https://github.com/npm/rfcs/pull/595#issuecomment-1200480148
|
| I don't find the cons all that compelling to be honest, or at
| least I think they warrant further discussion to see if there are
| workarounds (e.g. a choice of compression scheme for a library
| like typescript, if they would prefer faster publishes).
|
| It would have been interesting to see what eventually played out
| if the author hadn't closed the RFC themselves. It could have
| been the sort of thing that eventually happens after 2 years, but
| then quietly makes everybody's lives better.
| n4r9 wrote:
| I felt the same. The proposal wasn't rejected! Also,
| performance gains go beyond user stories - e.g. they reduce
| infra costs and environmental impact - so I think the main
| concerns of the maintainers could have been addressed.
| IshKebab wrote:
| > The proposal wasn't rejected!
|
| They soft-rejected by requiring more validation than was
| reasonable. I see this all the time. "But did you consider
| <extremely unlikely issue>? Please go and run more tests."
|
| It's pretty clear that the people making the decision didn't
| actually care about the bandwidth savings, otherwise they
| would have put the work in themselves to do this, e.g. by
| requiring Zopfli for popular packages. I doubt Microsoft
| cares if it takes an extra 2 minutes to publish Typescript.
|
| Kind of a wild decision considering NPM uses 4.5 PB of
| traffic per week. 5% of that is 225 TB/week, which according
| to my brief checks costs around $10k/week!
|
| I guess this is a "not my money" problem fundamentally.
| lyu07282 wrote:
| > which according to my brief checks costs around $10k/week
|
| That's the market price though, for Microsoft its a tiny
| fraction of that.
| johnfn wrote:
| This doesn't seem quite correct to me. They weren't asking
| for "more validation than was reasonable". They were asking
| for literally any proof that users would benefit from the
| proposal. That seems like an entirely reasonable thing to
| ask before changing the way every single NPM package gets
| published, ever.
|
| I do agree that 10k/week is non-negligible. Perhaps that
| means the people responsible for the 10k weren't in the
| room?
| bombcar wrote:
| Or another way to look at it is it's just (at most!) 5% off
| an already large bill, and it might cost more than that
| elsewhere.
|
| And I can buy 225 TB of bandwidth for less than $2k, I
| assume Microsoft can get better than some HN idiot buying
| Linode.
| arccy wrote:
| massively increase the open source github actions bill for
| runners running longer (compute is generally more
| expensive) to publish for a small decrease in network
| traffic (bandwidth is cheap at scale)?
| alt227 wrote:
| I feel massively increasing publish time is a valid reason not
| to push this though considering such small gains and who the
| gains apply to.
| scott_w wrote:
| I agree, going from 1 second to 2.5 minutes is a huge
| negative change, in my opinion. I know publishing a package
| isn't something you do 10x a day but it's probably a big
| enough change that, were I doing it, I'd think the publish
| process is hanging and keep retrying it.
| pletnes wrote:
| If you're working on the build process itself, you'll
| notice it a lot!
| rererereferred wrote:
| Since it's backwards compatible, individual maintainers could
| enable it in their own pipeline if they don't have issues
| with the slowdown. It sounds like it could be a single flag
| in the publish command.
| michaelmior wrote:
| Probably not worth the added complexity, but in theory, the
| package could be published immediately with the existing
| compression and then in the background, replaced with the
| Zopfli-compressed version.
| Null-Set wrote:
| No, it can't because the checksums won't match.
| michaelmior wrote:
| I don't think that's actually a problem, but it would
| require continuing to host both versions (at distinct
| URLs) for any users who may have installed the package
| before the Zopfli-compressed version completed. Although
| I think you could also get around this by tracking
| whether the newly-released package was ever served by the
| API. If not, which is probably the common case, the old
| gzip-compressed version could be deleted.
| hiatus wrote:
| Wouldn't that result in a different checksum for package-
| lock.json?
| aja12 wrote:
| > Probably not worth the added complexity, but in theory,
| the package could be published immediately with the
| existing compression and then in the background, replaced
| with the Zopfli-compressed version.
|
| Checksum matters aside, wouldn't that turn the 5% bandwidth
| savings into an almost double bandwidth increase though?
| IMHO, considering the complexity to even make it a build
| time option, the author made the right call.
| macspoofing wrote:
| > I don't find the cons all that compelling to be honest
|
| I found it reasonable.
|
| The 5% improvement was balanced against the cons of increased
| cli complexity, lack of native JS zopfli implementation, and
| slower compression .. and 5% just wasn't worth it at the moment
| - and I agree.
|
| >or at least I think they warrant further discussion
|
| I think that was the final statement.
| sd9 wrote:
| Yes, but there's a difference between "this warrants further
| discussion" and "this warrants further discussion and I'm
| closing the RFC". The latter all but guarantees that no
| further discussion will take place.
| philipwhiuk wrote:
| No it doesn't. It only does that if you think discussion
| around future improvements belongs in RFCs.
| mcherm wrote:
| Where DOES it belong, if not there?
| jerf wrote:
| "I don't find the cons all that compelling to be honest"
|
| This is a solid example of how things change at scale. Concerns
| I wouldn't even think about for my personal website become
| things I need to think about for the download site being hit by
| 50,000 of my customers become big deals when operating at the
| scale of npm.
|
| You'll find those arguments the pointless nitpicking of
| entrenched interests who just don't want to make any changes,
| until you experience your very own "oh man, I really thought
| this change was perfectly safe and now my entire customer base
| is trashed" moment, and then suddenly things like "hey, we need
| to consider how this affects old signatures and the speed of
| decompression and just generally whether this is worth the non-
| zero risks for what are in the end not really that substantial
| benefits".
|
| I do not say this as the wise Zen guru sitting cross-legged and
| meditating from a position of being above it all; I say it
| looking at my own battle scars from the Perfectly Safe things
| I've pushed out to my customer base, only to discover some tiny
| little nit caused me trouble. Fortunately I haven't caused any
| true catastrophes, but that's as much luck as skill.
|
| Attaining the proper balance between moving forward even though
| it incurs risk and just not changing things that are working is
| the hardest part of being a software maintainer, because both
| extremes are definitely bad. Everyone tends to start out in the
| former situation, but then when they are inevitably bitten it
| is important not to overcorrect into terrified fear of ever
| changing anything.
| sd9 wrote:
| I agree with everything you said, but it doesn't contradict
| my point
| jerf wrote:
| I'm saying you probably don't find them compelling because
| from your point of view, the problems don't look important
| to you. They don't from my point of view either. But my
| point of view is the wrong point of view. From their point
| of view this would be plenty to make me think twice and
| several times over past that from changing something so
| deeply fundamental to the system for what is a benefit that
| nobody who is actually paying the price for the package
| size seems to be particularly enthusiastic about. If the
| people paying the bandwidth bill aren't even that excited
| about a 5% reduction, then the cost/benefits analysis tips
| over into essentially "zero benefit, non-zero cost", and
| that's not very compelling.
| ffsm8 wrote:
| Or you're not understanding how he meant it: there are
| countless ways to roll out such changes, a hard change is
| likely a very bad idea as you've correctly pointed out.
|
| But it is possible to do it more gradually, I.e. by
| sneaking it in with a new API that's used by new npm
| version or similar.
|
| But it was his choice to make, and it's fine that he
| didn't feel enough value in pursuing such a tiny file
| size change
| sd9 wrote:
| The problems look important but underexplored
| pif wrote:
| > This is a solid example of how things change at scale.
|
| 5% is 5% at any scale.
| michaelmior wrote:
| Yes and no. If I'm paying $5 a month for storage, I
| probably don't care about saving 5% of my storage costs. If
| I'm paying $50,000/month in storage costs, 5% savings is a
| lot more worthwhile to pursue
| PaulHoule wrote:
| Doesn't npm belong to Microsoft? It must be hosted in
| Azure which they own so they must be paying a rock bottom
| rate for storage, bandwidth, everything.
| cwmma wrote:
| It's probably less about MS and more about the people
| downloading the packages
| PaulHoule wrote:
| For them it is 5% of something tiny.
| imoverclocked wrote:
| Maybe, maybe not. If you are on a bandwidth limited
| connection and you have a bunch of NPM packages to
| install, 5% of an hour is a few minutes saved. It's
| likely more than that because long-transfers often need
| to be restarted.
| PaulHoule wrote:
| A properly working cache and download manager that
| supports resume goes a long way.
|
| I could never get Docker to work on my ADSL when it was 2
| Mbps (FTTN got it up to 20) though it was fine in the
| Montreal office which had gigabit.
| gregmac wrote:
| 5% off your next lunch and 5% off your next car are very
| much not the same thing.
| dgfitz wrote:
| So what, instead of 50k for a car you spend 47.5k?
|
| If that moves the needle on your ability to purchase the
| car, you probably shouldn't be buying it.
|
| 5% is 5%.
| post-it wrote:
| I wouldn't pick 5C/ up off the ground but I would
| certainly pick up $2500.
| ziddoap wrote:
| Why do so many people take illustrative examples
| literally?
|
| I'm sure you can use your imagination to substitute
| "lunch" and "car" with other examples where the absolute
| change makes a difference despite the percent change
| being the same.
|
| Even taking it literally... The 5% might not tip the
| scale of whether or not I _can_ purchase the car, but I
| 'll spend a few hours of my time comparing prices at
| different dealers to save $2500. Most people would
| consider it dumb if you didn't shop around when making a
| large purchase.
|
| On the other hand, I'm not going to spend a few hours of
| my time at lunch so that I can save an extra $1 on a
| meal.
| kemitche wrote:
| If it takes 1 hour of effort to save 5%:
|
| - Doing 1 hour of effort to save 5% on your $20 lunch is
| foolhardy for most people. $1/hr is well below US minimum
| wage. - Doing 1 hour of effort to save 5% on your $50k
| car is wise. $2500/hr is well above what most people are
| making at work.
|
| It's not about whether the $2500 affects my ability to
| buy the car. It's about whether the time it takes me to
| save that 5% ends up being worthwhile to me given the
| actual amount saved.
|
| The question is really "given the person-hours it takes
| to apply the savings, and the real value of the savings,
| is the savings worth the person-hours spent?"
| jay_kyburz wrote:
| This is something we often do in our house. We talk about
| things in terms of hours worked rather than price. I
| think more people should do it.
| JZerf wrote:
| Those lunches could add up to something significant over
| time. If you're paying $10 per lunch for 10 years, that's
| $36,500 which is pretty comparable to the cost of a car.
| horsawlarway wrote:
| 5% of newly published packages, with a potentially serious
| degradation to package publish times for those who have to
| do that step.
|
| Given his numbers, let's say he saves 100Tb of bandwidth
| over a year. At AWS egress pricing... that's $5,000 total
| saved.
|
| And arguably - NPM is getting at least some of that savings
| by adding CPU costs to publishers at package time.
|
| Feels like... not enough to warrant a risky ecosystem
| change to me.
| AlotOfReading wrote:
| How often are individuals publishing to NPM? Once a day
| at most, more typically once a week or month? A few dozen
| seconds of one person's day every month isn't a terrible
| trade-off.
|
| Even that's addressable though if there's motivation,
| since something like transcoding server side during
| publication just for popular packages would probably get
| 80% of the benefit with no client-side increase in
| publication time.
| true_religion wrote:
| https://www.reddit.com/r/webdev/comments/1ff3ps5/these_50
| 00_...
|
| NPM uses at least 5 petabytes per week. 5% of that is 250
| terabytes.
|
| So $15,000 a week, or $780,000 a year in savings could've
| been gained.
| canucker2016 wrote:
| In a great example of the Pareto Principle (80/20), or
| actually even more extreme, let's only apply this Zopfli
| optimization if the package download total is equal or
| more than 1GiB (from the Weekly Traffic in GiB column of
| the Top 5000 Weekly by Traffic tab of the Google Sheets
| file from the reddit post).
|
| For reference, total bandwidth used by all 5000 packages
| is 4_752_397 GiB.
|
| Packages >= 1GiB bandwidth/week - That turns out to be
| 437 packages (there's a header row, so it's rows 2-438)
| which uses 4_205_510 GiB.
|
| So 88% of the top 5000 bandwidth is consumed by
| downloading the top 8.7% (437) packages.
|
| 5% is about 210 TiB.
|
| Limiting to the top 100 packages by bandwidth results in
| 3_217_584 GiB, which is 68% of total bandwidth used by 2%
| of the total packages.
|
| 5% is about 161 TiB.
| knighthack wrote:
| Do you even know how absolute numbers work vis-a-vis
| percentages?
| Aicy wrote:
| That's right, and 5% of a very small number is a very small
| number. 5% of a very big number is a big number.
| syncsynchalt wrote:
| In some scenarios the equation flips, and the enterprise is
| looking for _more_ scale.
|
| The more bandwidth that Cloudflare needs, the more leverage
| they have at the peering table. As GitHub's largest repo
| (the @types / DefinitelyTyped repo owned by Microsoft) gets
| larger, the more experience the owner of GitHub (also
| Microsoft) gets in hosting the world's largest git repos.
|
| I would say this qualifies as one of those cases, as npmjs
| is hosted on Azure. The more resources that NPM needs, the
| more Microsoft can build towards parity with AWS's
| footprint.
| advisedwang wrote:
| The pros aren't all that compelling either. The npm repo is the
| only group that this would really be remotely significant for,
| and there seemed to be no interest. So it doesn't take much of
| a con to nix a solution to a non-problem.
| ForOldHack wrote:
| Every single download, until the end of time is affected: It
| speeds up the servers, speeds up the updates, saves disk
| space on the update servers, and saves on bandwidth costs and
| usage.
|
| Everyone benefits, the only cost is a ultra microscopic time
| on the front end, and a tiny cost on the client end, and for
| a very significant number of users, time and money saved. The
| examples of compression here...
| orta wrote:
| Congrats on a great write-up. Sometimes trying to ship something
| at that sorta scale turns out to just not really make sense in a
| way that is hard to see at the beginning.
|
| Another personal win is that you got a very thorough
| understanding of the people involved and how the outreach parts
| of the RFC process works. I've also had a few fail, but I've also
| had a few pass! Always easier to do the next time
| stabbles wrote:
| One thing that's excellent about zopfli (apart from being gzip
| compatible) is how easy it is to bootstrap: git
| clone https://github.com/google/zopfli.git cc -O2
| zopfli/src/zopfli/*.c -lm
|
| It just requires a C compiler and linker.
| stabbles wrote:
| The main downside though, it's impressively slow.
|
| Comparing to gzip isn't really worth it. Combine pigz
| (threaded) with zlib-ng (simd) and you get decent performance.
| pigz is used in `docker push`.
|
| For example, gzipping llvm.tar (624MB) takes less than a second
| for me: $ time
| /home/harmen/spack/opt/spack/linux-ubuntu24.04-zen2/gcc-13.2.0/
| pigz-2.8-5ptdjrmudifhjvhb757ym2bzvgtcsoqc/bin/pigz -k hello.tar
| real 0m0.779s user 0m11.126s sys
| 0m0.460s
|
| At the same time, zopfli compiled with -O3 -march=native takes
| 35 minutes. No wonder it's not popular.
|
| It is almost _2700x_ slower than the state of the art for just
| 6.8% bytes saved.
| Levitating wrote:
| > 2700x slower
|
| That is impressively slow.
|
| In my opinion even the 28x decrease in performance mentioned
| would be a no-go. Sure the package saves a few bytes but I
| don't need my entire pc to grind to a halt every time I
| publish a package.
|
| Besides, storage is cheap but CPU power draw is not. Imagine
| the additional CO2 that would have to be produced if this RFC
| was merged.
|
| > 2 gigabytes of bandwidth per year across all installations
|
| This must be a really rough estimate and I am curious how it
| was calculated. In any case 2 gigabytes over _a year_ is
| absolutely nothing. Just my home network can produce a
| terabyte a day.
| bonzini wrote:
| 2 GB for the author's package which is neither extremely
| common nor large; it would be 2 TB/year just for react
| core.
| Levitating wrote:
| I am confused, how is this number calculated?
|
| Because the authors mentioned package, Helmet[1], is
| 103KB _uncompressed_ and has had 132 versions in 13
| years. Meaning downloading every Helmet version
| uncompressed would result in 132*103KB = 13.7MB.
|
| I feel like I must be missing something really obvious.
|
| Edit: Oh it's 2GB/year _across_ all installations.
|
| [1]:
| https://www.npmjs.com/package/helmet?activeTab=versions
| ape4 wrote:
| Usually people require more than 5% to make a big change
| hinkley wrote:
| That's why our code is so slow. Dozens of poor decisions that
| each account for 2-4% of overall time lost, but 30-60% in
| aggregate.
| fergie wrote:
| Props to anyone who tries to make the world a better place.
|
| Its not always obvious who has the most important use cases. In
| the case of NPM they are prioritizing the user experience of
| module authors. I totally see how this change would be great for
| module consumers, yet create potentially massive inconvenience
| for module authors.
|
| Interesting write-up
| atiedebee wrote:
| I think "massive" is overstating it. I don't think deploying a
| new version of a package is something that happens many times a
| day, so it wouldn't be a constant pain point.
|
| Also, since this is a case of having something compressed once
| and decompressed potentially thousands of times, it seems like
| the perfect tool for the job.
| philipwhiuk wrote:
| Every build in a CI system would probably create the package.
|
| This is changing every build in every CI system to make it
| slower.
| mkesper wrote:
| Just use it on the release build.
| avodonosov wrote:
| My experiment on how to reduce javascript size of every web app
| by 30-50% : https://github.com/avodonosov/pocl
|
| Working approach, but in the end I abandoned the project - I
| doubt people care about such js size savings.
| dagelf wrote:
| Wdym?? 50% is a big deal
| bluGill wrote:
| 50% size savings isn't important to the people who pay for
| it. They pay at most pennies for 100% savings (that is
| somehow all the functionality in zero bytes - not worth
| anything to those paying the bills)
| tyre wrote:
| Size savings translates to latency improvements which
| directly affects conversion rates. Smaller size isn't about
| reducing costs but increased revenue. People care.
| soared wrote:
| Agreed - often a CTO of an ecom site is very very focused
| on site speed and has it as their #1 priority since it
| directly increases revenue.
| fwip wrote:
| Note that this proof-of-concept implementation saves
| latency on first load, but may add latency at surprising
| points while using the website. Any user invoking a
| rarely-used function would see a delay before the
| javascript executes, without the traditional UI
| affordances (spinners etc) to indicate that the
| application was waiting on the network. Further, these
| secretly-slow paths may change from visit to visit. Many
| users know how to "wait for the app to be ready," but the
| traditional expectation is that once it's loaded, the
| page/app will work, and any further delays will be
| signposted.
|
| I'm sure it works great when you've got high-speed
| internet, but might break things unacceptably for users
| on mobile or satellite connections.
| vlovich123 wrote:
| > without the traditional UI affordances (spinners etc)
| to indicate that the application was waiting on the
| network.
|
| This part is obviously trivially solvable. I think the
| same basic idea is going to at some point make it but
| it'll have to be through explicit annotations first and
| then there will be tooling to automatically do this for
| your code based upon historical visits where you get to
| tune the % of visitors that get additional fetches. Also,
| you could probably fetch the split off script in the
| background anyway as a prefetch + download everything
| rather than just 1 function at a time (or even
| downloading related groups of functions together)
|
| The idea has lots of merit and you just have to execute
| it right.
| philipwhiuk wrote:
| How do you evaluate call usage?
| KTibow wrote:
| I think this is called tree shaking and Vite/Rollup do this by
| default these days. Of course, it's easy when you explicitly
| say what you're importing.
| avodonosov wrote:
| That's not tree-shaking.
| hinkley wrote:
| I got measurable decreases in deployment time by shrinking the
| node_modules directory in our docker images.
|
| I think people forget that, when you're copying the same images
| to dozens and dozens of boxes, any improvement starts to add up
| to real numbers.
| syncsynchalt wrote:
| I've not done it, but have you considered using `pnpm` and
| volume-mounting a shared persistent `pnpm-store` into the
| containers? It seems like you'd get near-instant npm installs
| that way.
| hinkley wrote:
| The only time npm install was on the critical path was
| hotfixes. It's definitely worth considering. But I was
| already deep into doing people giant favors that they
| didn't even notice, so I was juggling many other goals. I
| think the only thank you I got was from the UI lead, who
| had some soda straw internet connection and this and
| another thing I did saved him a bunch of hard to recover
| timeouts.
| jsheard wrote:
| I wonder if it would make more sense to pursue Brotli at this
| point, Node has had it built-in since 10.x so it should be pretty
| ubiquitous by now. It would require an update to NPM itself
| though.
| pornel wrote:
| It only doesn't apply to existing _versions_ of existing
| packages. Newer releases would apply Zopfli, so over time likely
| the majority of actively used /maintained packages would be
| recompressed.
| choobacker wrote:
| Nice write up!
|
| > When it was finally my turn, I stammered.
|
| > Watching it back, I cringe a bit. I was wordy, unclear, and
| unconvincing.
|
| > You can watch my mumbling in the recording
|
| I watched this, and the author was articulate and presented well.
| The author is too harsh!
|
| Good job for trying to push the boundaries.
| hartator wrote:
| I mean 4-5% the size for 10-100x the time is not worth it.
| swiftcoder wrote:
| That's not actually so straightforward. You pay the 10-100x
| slowdown _once_ on the compressing side, to save 4-5% on
| _every_ download - which for a popular package one would expect
| downloads to be in the millions.
| philipwhiuk wrote:
| The downloads are cached. The build happens on every publish
| for every CI build.
| inglor_cz wrote:
| As the author himself said, just React was downloaded half a
| billion times; that is a lot of saved bandwidth on both sides,
| but especially so for the server.
|
| Maybe it would make sense to only apply this improvement in
| images that are a) either very big or b) get downloaded at
| least million times each year or so. That would cover most of
| the savings while leaving most packages and developers out of
| it.
| bonzini wrote:
| Assuming download and decompression cost to be proportional to
| the size of the incoming compressed stream, it would break even
| at 2000 downloads. A big assumption I know, but 2000 is a
| really small number.
| liontwist wrote:
| It absolutely is. Packages are zipped once and downloaded
| thousands of times.
| cedws wrote:
| Last I checked npm packages were full of garbage including non-
| source code. There's no reason for node_modules to be as big as
| it usually is, text compresses extremely well. It's just general
| sloppiness endemic to the JavaScript ecosystem.
| MortyWaves wrote:
| Totally agree with you. I wish npm did a better job of
| filtering the crap files out of packages.
| Alifatisk wrote:
| At least, switch to pnpm minimize the bloat
| jefozabuss wrote:
| I just installed a project with pnpm about 120 packages
| mostly react/webpack/eslint/redux related
|
| with prod env: 700MB
|
| without prod env: 900MB
|
| sadly the bloat cannot be avoided that well :/
| jeffhuys wrote:
| pnpm stores them in a central place and symlinks them.
| You'll see the benefits when you have multiple projects
| with a lot of the same packages.
| syncsynchalt wrote:
| You'll also see the benefit when `rm -rf`ing a
| `node_modules` and re-installing, as pnpm still has a
| local copy that it can re-link after validating its
| integrity.
| vinnymac wrote:
| You might be interested in e18e if you would like to see that
| change: https://e18e.dev/
|
| They've done a lot of great work already.
| KTibow wrote:
| Does this replace ljharb stuff?
| hinkley wrote:
| I believe I knocked 10% off of our node_modules directory by
| filing .npmignore PRs or bug reports to tools we used.
|
| Now if rxjs weren't a dumpster fire...
| TheRealPomax wrote:
| That's on the package publishers, not NPM. They give you an
| `.npmignore` that's trivially filled out to ensure your package
| isn't full of garbage, so if someone doesn't bother using that:
| that's on them, not NPM.
|
| (And it's also a little on the folks who install dependencies:
| if the cruft in a specific library bothers you, hit up the repo
| and file an issue (or even MR/PR) to get that .npmignore file
| filled out. I've helped folks reduce their packages by 50+MB in
| some cases, it's worth your own time as much as it is theirs)
| eitau_1 wrote:
| It's not even funny: $ ll /nix/store/*-insect-5
| .9.0/lib/node_modules/insect/node_modules/clipboardy/fallbacks/
| * /nix/store/...-insect-5.9.0/lib/node_modules/insect/nod
| e_modules/clipboardy/fallbacks/linux: .r-xr-xr-x 129k
| root 1 Jan 1970 xsel /nix/store/...-insect-5.9.0/l
| ib/node_modules/insect/node_modules/clipboardy/fallbacks/window
| s: .r-xr-xr-x 444k root 1 Jan 1970 clipboard_i686.exe
| .r-xr-xr-x 331k root 1 Jan 1970 clipboard_x86_64.exe
|
| (clipboardy ships executables and none of them can be run on
| NixOS btw)
| cedws wrote:
| Are they reproducible? Shipping binaries in JS packages is
| dodgy AF - a Jia Tan attack waiting to happen.
| eitau_1 wrote:
| The executables are vendored in the repo [0].
|
| [0] https://github.com/sindresorhus/clipboardy/tree/main/fa
| llbac...
| dicytea wrote:
| I don't know why, but clipboard libraries tend to be really
| poorly implemented, especially in scripting languages.
|
| I just checked out clipboardy and all they do is dispatch
| binaries from the path and hope it's the right one (or if
| it's even there at all). I think I had a similar experience
| with Python and Lua scripts. There's an unfunny amount of
| poorly-written one-off clipboard scripts out there just
| waiting to be exploited.
|
| I'm only glad that the go-to clipboard library in Rust
| (arboard) seems solid.
| hombre_fatal wrote:
| One of the things I like about node_modules is that it's not
| purely source code and it's not purely build artifacts.
|
| You can read the code and you can usually read the actual
| README/docs/tests of the package instead of having to find it
| online. And you can usually edit library code for debugging
| purposes.
|
| If node_modules is taking up a lot of space across a bunch of
| old projects, just write the `find` script that recursively
| deletes them all; You can always run `npm install` in the
| future when you need to work on that project again.
| glenjamin wrote:
| This strikes me as something that could be done for the highest
| traffic packages at the backend, rather than be driven by the
| client at pubish-time.
| fastest963 wrote:
| The article talks about this. There are hashes that are
| generated for the tarball so the backend can't recompress
| anything.
| BonoboIO wrote:
| In summary: It's a nice feature, which gives nice benefits for
| often downloaded packages, but nobody at npm cares for the
| bandwidth?
| nikeee wrote:
| I don't see why it wouldn't be possible to hide behind a flag
| once Node.js supports zopfli natively. In case of CI/CD, it's
| totally feasible to just add a --strong-compression flag. In that
| case, the user expects it to take its time.
|
| TS releases a non-preview version every few months, so using 2.5
| minutes for compression would work.
| abound wrote:
| A few people have mentioned the environmental angle, but I'd care
| more about if/how much this slows down decompression on the
| client. Compressing React 20x slower once is one thing, but 50
| million decompressions being even 1% slower is likely net more
| energy intensive, even accounting for the saved energy
| transmitting 4-5% fewer bits on the wire.
| web007 wrote:
| It's very likely zero or positive impact on the decompression
| side of things.
|
| Starting with smaller data means everything ends up smaller.
| It's the same decompression algorithm in all cases, so it's not
| some special / unoptimized branch of code. It's yielding the
| same data in the end, so writes equal out plus or minus disk
| queue fullness and power cycles. It's _maybe_ better for RAM
| and CPU because more data fits in cache, so less memory is used
| and the compute is idle less often.
|
| It's relatively easy to test decompression efficiency if you
| think CPU time is a good proxy for energy usage: go find
| something like React and test the decomp time of gzip -9 vs
| zopfli. Or even better, find something similar but much bigger
| so you can see the delta and it's not lost in rounding errors.
| sltkr wrote:
| For formats like deflate, decompression time doesn't generally
| depend on compressed size. (zstd is similar, though memory use
| can depend on the compression level used).
|
| This means an optimization like this is virtually guaranteed to
| be a net positive on the receiving end, since you always save a
| bit of time/energy when downloading a smaller compressed file.
| adgjlsfhk1 wrote:
| This seems like a place where the more ambitious version that
| switches to ZSTD might have better tradeoffs. You would get
| similar or better compression, with faster decompression and
| recompression than zstd.It would lose backward compatibility
| though...
| bufferoverflow wrote:
| Brotli and lzo1b have good compression ratios and pretty fast
| decompression speeds. Compression speed should not matter that
| much, since you only do it once.
|
| https://quixdb.github.io/squash-benchmark/
|
| There even more obscure options:
|
| https://www.mattmahoney.net/dc/text.html
| vlovich123 wrote:
| Not necessarily - could retain backward compat by publishing
| both gzip and zstd variants and having downloaders with newer
| npm's prefer to download zstd. Over time, you could require
| packages only upload zstd going forward and either generate
| zstd versions of the backlog of unmaintained packages or at
| least those that see some amount of traffic over some time
| period if you're willing to drop very old packages. The ability
| to install arbitrary versions of packages probably means you're
| probably better off reprocessing the backlog although that may
| cost more than doing nothing.
|
| The package lock checksum is probably a more solvable issue
| with some coordination.
|
| The benefit of doing this though is less immediate - it will
| take a few years to show payoff and these kinds of payoffs are
| not typically made by the kind of committee decisions process
| described (for better or worse).
| chuckadams wrote:
| Switching to a shared cache in the fashion of pnpm would
| eliminate far more redundant downloads than a compression
| algorithm needing 20x more CPU.
| liontwist wrote:
| The fact that you are pursuing this is admirable.
|
| But this whole thing sounds too much like work. Finding
| opportunities, convincing entrenched stakeholders, accommodating
| irrelevant feedback, pitching in meetings -- this is the kind of
| thing that top engineers get paid a lot of money to do.
|
| For me personally open source is the time to be creative and
| free. So my tolerance for anything more than review is very low.
| And I would have quit at the first roadblock.
|
| What's a little sad, is NPM should not be operating like a
| company with 1000+ employees. The "persuade us users want this"
| approach is only going to stop volunteers. They should be
| proactively identifying efforts like this and helping you bring
| it across the finish line.
| coliveira wrote:
| The problem is that this guy is treating open source as if it
| was a company where you need to direct a project to completion.
| Nobody in open source wants to be told what to do. Just release
| your work, if it is useful, the community will pick it up and
| everybody will benefit. You cannot force your improvement into
| the whole group, even if it is beneficial in your opinion.
| liontwist wrote:
| > where you need to direct a project to completion
|
| Do you want to get a change in, or not?
|
| Is this a project working with the community or not?
|
| > Just release your work
|
| What motivation exists to optimize package formats if nobody
| uses that package format? There are no benefits unless it's
| in mainline.
|
| > Nobody in open source wants to be told what to do
|
| He's not telling anybody to do work. He is sharing an
| optimization with clear tradeoffs - not a new architecture.
|
| > You cannot force your improvement into the whole group
|
| Nope, but communication is key. "put up a PR and we will let
| you know whether it's something we want to pursue".
|
| Instead they put him through several levels of gatekeepers
| where each one gave him incorrect feedback.
|
| "Why do we want to optimize bandwidth" is a question they
| should have the answer to.
|
| If this PR showed up on my project I would say "I'm worried
| about X,Y,Z" we will set up a test for X and Y and get back
| to you. Could you look into Z?
| gjsman-1000 wrote:
| > What's a little sad, is NPM should not be operating like a
| company with 1000+ employees. The "persuade us users want this"
| approach is only going to stop volunteers. They should be
| proactively identifying efforts like this and helping you bring
| it across the finish line.
|
| Says who?
|
| Says an engineer? Says a product person?
|
| NPM is a company with 14 employees; with a system integrated
| into countless extremely niche and weird integrations they
| cannot control. Many of those integrations might make a
| professional engineer's hair catch fire - "it should never be
| done this way!" - but the real world is that the wrong way is
| the majority of the time. There's no guarantee that many of the
| downloads come from the official client, just as one example.
|
| The last thing they need, or I want, or any of their customers
| want, or their 14 employees need, is something that might break
| backwards compatibility in an extremely niche case, anger a
| major customer, cause countless support tickets, all for a tiny
| optimization nobody cares about.
|
| This is something I've learned here about HN that, for own
| mental health, I now dismiss: Engineers are obsessed with 2%
| optimizations here, 5% optimizations there; unchecked, it will
| literally turn into an OCD outlet, all for things nobody in the
| non-tech world even notices, let alone asks about. Just let it
| go.
| liontwist wrote:
| Open source needs to operate differently than a company
| because people don't have time/money/energy to deal with
| bullshit.
|
| Hell. Even 15 employees larping as a corporation is going to
| be inefficient.
|
| what you and NPM are telling us, is that they are happy to
| take free labor, but this is not an open source project.
|
| > Engineers are obsessed with 2% optimizations here
|
| Actually in large products these are incredible finds. But
| ok. They should have the leadership to know which bandwidth
| tradeoffs they are committed to and tell him immediately it's
| not what they want, rather than sending him to various
| gatekeepers.
| gjsman-1000 wrote:
| Correct; NPM is not an "open source project" in the sense
| of a volunteer-first development model. Neither is Linux -
| over 80% of commits are corporate, and have been for a
| decade. Neither is Blender anymore - the Blender
| Development Fund raking in $3M a year calls the shots.
| Every successful "large" open source project has outgrown
| the volunteer community.
|
| > Actually in large products these are incredible finds.
|
| In large products, incredible finds may be true; but
| breaking compatibility with just 0.1% of your customers is
| also an incredible disaster.
| liontwist wrote:
| > breaking compatibility with just 0.1%
|
| Yes. But in this story nothing like that happened.
| gjsman-1000 wrote:
| But NPM has no proof their dashboard won't light up full
| of corporate customers panicking the moment it goes to
| production; because their hardcoded integration to have
| AWS download packages and decompress them with a Lambda
| and send them to an S3 bucket can no longer decompress
| fast enough while completing other build steps to avoid
| mandatory timeouts; just as one stupid example of
| something that could go wrong. IT is also demanding now
| that NPM fix it rather than modify the build pipeline
| which would take weeks to validate, so corporate's
| begging NPM to fix it by Tuesday's marketing blitz.
|
| Just because it's safe in a lab provides no guarantee
| it's safe in production.
| liontwist wrote:
| Ok, but why is the burden on him to show that? Are they
| not interested in improving bandwidth and speed for their
| users?
|
| The conclusion of this line of reasoning is to never make
| any change.
|
| If contributions are not welcome, don't pretend they are
| and waste my time.
|
| > can no longer decompress fast enough
|
| Already discussed this in another thread. It's not an
| issue.
| maccard wrote:
| That's an argument against making any change to the
| packaging system ever. "It might break something
| somewhere" isn't an argument, it's a paralysis against
| change. Improving the edge locality of delivery of npm
| packages could speed up npm installs. But speeding up npm
| installs might cause the CI system which is reliant on it
| for concurrency issues to have a race condition. Does
| that mean that npm can't ever make it faster either?
| gjsman-1000 wrote:
| It is an argument. An age old argument:
|
| "If it ain't broke, don't fix it."
| liontwist wrote:
| disable PRs if this is your policy.
| maccard wrote:
| This attitude is how in an age with gigabit fiber, 4GB/s
| hard drive write speed, 8x4 GHz cores with simd
| instructions it takes 30+ seconds to bundle a handful of
| files of JavaScript.
| stonemetal12 wrote:
| NPM is a webservice. They could package the top 10-15
| enhancements call it V2. When 98% of traffic is V2 turn V1
| off. Repeat every 10 years or so until they work their way
| into having a good protocol.
| maccard wrote:
| > Engineers are obsessed with 2% optimizations here, 5%
| optimizations there; unchecked, it will literally turn into
| an OCD outlet, all for things nobody in the non-tech world
| even notices, let alone asks about. Just let it go.
|
| I absolutely disagree with you. If the world took more of
| those 5% optimisations here and there everything would be
| faster. I think more people should look at those 5%
| optimisations. In many cases they unlock knowledge that
| results in a 20% speed up later down the line. An example
| from my past - I was tasked with reducing the running speed
| of a one shot tool we were using at $JOB. It was taking about
| 15 minutes to run. I shaved off seconds here and there with
| some fine grained optimisations, and tens of seconds with
| some modernisation of some core libraries. Nothing earth
| shattering but improvements none the less. One day, I noticed
| a pattern was repeating and I was fixing an issue for the
| third time in a different place (searching a gigantic array
| of stuff for a specific entry). I took a step back and
| realised that if I replaced the mega list with a hash table
| it might fix every instance of this issue in our app. It was
| a massive change, touching pretty much every file. And all of
| a sudden our 15 minute runtime was under 30 seconds.
|
| People used this tool every day, it was developed by a team
| of engineers wildly smarter than me. But it had grown and
| nobody really understood the impact of the growth. When it
| started that array was 30, 50 entries. On our project it was
| 300,000 and growing every day.
|
| Not paying attention to these things causes decay and rot.
| Not every change should be taken, but more people should
| care.
| lyu07282 wrote:
| > Says who?
|
| > Says an engineer?
|
| I prevent cross-site scripting, I monitor for DDoS attacks,
| emergency database rollbacks, and faulty transaction
| handlings. The Internet heard of it? Transfers half a
| petabyte of data every minute. Do you have any idea how that
| happens? All those YouPorn ones and zeroes streaming directly
| to your shitty, little smart phone day after day? Every
| dipshit who shits his pants if he can't get the new dubstep
| Skrillex remix in under 12 seconds? It's not magic, it's
| talent and sweat. People like me, ensuring your packets get
| delivered, un-sniffed. So what do I do? I make sure that one
| bad config on one key component doesn't bankrupt the entire
| fucking company. That's what the fuck I do.
| efitz wrote:
| I think that the reason NPM responded this way is because it
| was a premature optimization.
|
| If/when NPM has a problem - storage costs are too high, or
| transfer costs are too high, or user feedback indicates that
| users are unhappy with transfer sizes - then they will be ready
| to listen to this kind of proposal.
|
| I think their response was completely rational, especially
| given a potentially huge impact on compute costs and/or
| publication latency.
| maccard wrote:
| I disagree with it being a premature optimisation. Treating
| everything that you haven't already identified personally as
| a problem as a premature optimisation is cargo cutting in its
| own way. The attitude of not caring is why npm and various
| tools are so slow.
|
| That said, I think NPM's response was totally correct -
| explain the problem and the tradeoffs. And OP decided the
| tradeoffs weren't worth it, which is totally fair.
| Cthulhu_ wrote:
| While NPM is open source, it's in the awkward spot of also
| having... probably hundreds of thousands if not millions of
| professional applications depend on it; it _should_ be run like
| a business, because millions depend on it.
|
| ...which makes it all the weirder that security isn 't any
| better, as in, publishing a package can be done without a
| review step on the npm side, for example. I find it strange
| that they haven't doubled down on enterprise offerings, e.g.
| creating hosted versions (corporate proxies), reviewed /
| validated / LTS / certified versions of packages, etc.
| liontwist wrote:
| Why would a more complex zip slow down decompress? This comment
| seems to misunderstand how these formats work. OP is right.
| dvh wrote:
| Imagine being in the middle of nowhere, in winter, on Saturday
| night, on some farm, knee deep in a cow piss, servicing some 3rd
| party feed dispenser, only to discover that you have possible
| solution but it's in some obscure format instead of .tar.gz.
| Nearest internet 60 miles away. This is what I always imagine
| happening when some new obscure format come into play, imagine
| the poor fella, alone, cold, screaming. So incredibly close to
| his goal, but ultimately stopped by some artificial unnecessary
| made-up bullshit.
| tehjoker wrote:
| I believe zopfli compression is backwards compatible with
| DEFLATE, it just uses more CPU during the compression phase.
| PaulHoule wrote:
| Years back I came to the conclusion that conda using bzip2 for
| compression was a big mistake.
|
| Back then if you wanted to use a particular neural network it was
| meant for a certain version of Tensorflow which expected you to
| have a certain version of the CUDA libs.
|
| If you had to work with multiple models the "normal" way to do
| things was use the developer unfriendly [1][2] installers from
| NVIDIA to install a single version of the libs at a time.
|
| Turned out you could have many versions of CUDA installed as long
| as you kept them in different directories and set the library
| path accordingly, it made sense to pack them up for conda and
| install them together with everything else.
|
| But oh boy was it slow to unpack those bzip2 packages! Since
| conda had good caching, if you build environments often at all
| you could be paying more in decompress time than you pay in
| compression time.
|
| If you were building a new system today you'd probably use zstd
| since it beats gzip on both speed and compression.
|
| [1] click... click... click...
|
| [2] like they're really going to do something useful with my
| email address
| zahlman wrote:
| >But oh boy was it slow to unpack those bzip2 packages! Since
| conda had good caching, if you build environments often at all
| you could be paying more in decompress time than you pay in
| compression time.
|
| For Paper, I'm planning to cache both the wheel archives (so
| that they're available without recompressing on demand) and
| unpacked versions (installing into new environments will
| generally use hard links to the unpacked cache, where
| possible).
|
| > If you were building a new system today you'd probably use
| zstd since it beats gzip on both speed and compression.
|
| FWIW, in my testing LZMA is a big win (and I'm sure zstd would
| be as well, but LZMA has standard library support already). But
| there are serious roadblocks to adopting a change like that in
| the Python ecosystem. This sort of idea puts them several
| layers deep in meta-discussion - see for example
| https://discuss.python.org/t/pep-777-how-to-re-invent-the-wh...
| . In general, progress on Python packaging gets stuck in a
| double-bind: try to change too little and you won't get any
| buy-in that it's worthwhile, but try to change too much and
| everyone will freak out about backwards compatibility.
| kittikitti wrote:
| Thank you so much for posting this. The original logic was clear
| and it had me excited! I believe this is useful because
| compression is very common and although it might not fit
| perfectly in this scenario, it could very well be a breakthrough
| in another. If I come across a framework that could also benefit
| from this compression algorithm, I'll be sure to give you credit.
| bhouston wrote:
| What about a different approach - an optional npm proxy that
| recompresses popular packages with 7z/etc in the background?
|
| Could verify package integrity by hashing contents rather than
| archives, plus digital signatures for recompressed versions. Only
| kicks in for frequently downloaded packages once compression is
| ready.
|
| Benefits: No npm changes needed, opt-in only, potential for big
| bandwidth savings on popular packages. Main tradeoff is
| additional verification steps, but they could be optional given a
| digital signature approach.
|
| Curious if others see major security holes in this approach?
| ndriscoll wrote:
| This felt like the obvious way to do things to me: hash a .tar
| file, not a .tar.gz file. Use Accept-Encoding to negotiate the
| compression scheme for transfers. CDN can compress on the fly
| or optionally cache precompressed files. i.e. just use standard
| off-the-shelf HTTP features. These days I prefer uncompressed
| .tar files anyway because ZFS has transparent zstd, so
| decompressed archive files are generally smaller than a .gz.
| cesarb wrote:
| > hash a .tar file, not a .tar.gz file
|
| For security reasons, it's usually better to hash the
| compressed file, since it reduces the attack surface: the
| decompressor is not exposed to unverified data. There have
| already been vulnerabilities in decompressor implementations
| which can be exploited through malformed compressed data (and
| this includes IIRC at least one vulnerability in zlib, which
| is the standard decompressor for .gz).
| bhouston wrote:
| This suggests one should just upload a tar rather than a
| compressed file. Makes sense because one can scan the
| contents for malicious files without risking a decompressor
| bug.
|
| BTW npm decompressed all packages anyhow because it lets
| you view the contents these days on its website.
| bhouston wrote:
| You are correct. They should be uploading and downloading
| dumb tar files and let the HTTP connection negotiate the
| compression method. All hashes should be based on the
| uncompressed raw tar dump. This would be proper separation of
| concerns.
| hinkley wrote:
| Does npm even default to gzip -9? Wikipedia claims zopfli is 80
| times slower under default settings.
| zahlman wrote:
| My experience has been that past -6 or so, gzip files get only
| a tiny bit smaller in typical cases. (I think I've even seen
| them get bigger with -9.)
| jefozabuss wrote:
| I wonder what is the tarball size difference on average if you'd
| for example download everything in one tarball (full package
| list) instead of 1-by-1 as the gzip compression would work way
| better in that case.
|
| Also for bigger companies this is not really a "big" problem as
| they usually have in-house proxies (as you cannot rely on a 3rd
| party repository in CI/CD for multiple reasons (security, audit,
| speed, etc)).
| snizovtsev wrote:
| Yes, but it was expected. It's like prioritising code readability
| over performance everywhere but the hot path.
|
| Earlier in my career, I managed to use Zopfli once to compress
| gigabytes of PNG assets into a fast in-memory database supporting
| a 50K+ RPS web page. We wanted to keep it simple and avoid the
| complexity of horizontal scaling, and it was OK to drop some
| rarely used images. So the more images we could pack into a
| single server, the more coverage we had. In that sense Zopfli was
| beneficial.
| 1337shadow wrote:
| Ok but why doesn't npm registry actually recompress the archives?
| It can even apply that retroactively, wouldn't require zopli in
| npm CLI
| aseipp wrote:
| Hashes of the tarballs are recorded in the package-lock.json of
| downstream dependants, so recompressing the files in place will
| cause the hashes to change and break everyone. It has to be
| done at upload time.
| notpushkin wrote:
| But it still can be done on the npm side, right?
| bhouston wrote:
| The hashes of the uncompressed tarballs would be great. Then
| the HTTP connection can negotiate a compression format for
| transfer (which can change over time at HTTP itself changes)
| rather than baking it into the NPM package standard (which is
| incredibly inflexible.)
| tonymet wrote:
| I'm more concerned about the 40gb of node_modules. Why hasn't
| node supported tgz node_modules. That would save 75% of the space
| or more.
| bhouston wrote:
| isn't this a file system thing? Why bake it into npm?
| tonymet wrote:
| efficiency
| nopurpose wrote:
| It reminds me of an effort to improve docker image format and
| make it move away from being just a tar file. I can't find links
| anymore, but it was a pretty clever design, which still couldn't
| beat dumb tar in efficiency.
| bhouston wrote:
| Transferring around dumb tar is actually smart because the
| HTTPS connection can negotiate a compressed version of it to
| transfer - e.g. gzip, brotli, etc. No need to bake in an
| unchangable compression format into the standard.
| woadwarrior01 wrote:
| From the RFC on github[1].
|
| > Zopfli is written in C, which presents challenges. Unless it
| was added to Node core, the CLI would need to (1) rewrite Zopfli
| in JS, possibly impacting performance (2) rely on a native
| module, impacting reliability (3) rely on a WebAssembly module.
| All of these options add complexity.
|
| Wow! Who's going to tell them that V8 is written in C++? :)
|
| [1]: https://github.com/npm/rfcs/pull/595
| kmacdough wrote:
| It's not about C per-se, as much as each native compiled
| dependency creates additional maintenance concerns. Changes to
| hardware/OS can require a recompile or even fixes. NPM build
| system already requires a JavaScript runtime, so is already
| handled as part of existing maintenance. The point is that
| Zopfli either needs to be rewritten for a platform-agnostic
| abstraction they already support, or else Zopfli will be added
| to a list of native modules to maintain.
| woadwarrior01 wrote:
| > It's not about C per-se, as much as each native compiled
| dependency creates additional maintenance concerns. Changes
| to hardware/OS can require a recompile or even fixes.
|
| This is a canard. zopfli is written in portable C and is far
| more portable than the nodejs runtime. On any hardware/OS
| combo that one can build the nodejs runtime, they certainly
| can also build and run zopfli.
| gweinberg wrote:
| I was under the impression that bzip compresses more than gzip,
| but gzip is much faster, so gzip is better for things that need
| to be compressed on the fly, and bzip is better fro things that
| get archived. Is this not true? Wouldn't it have been better to
| use bzip all along for this purpose?
| bangaladore wrote:
| I think the main TLDR here [1]:
|
| > For example, I tried recompressing the latest version of the
| typescript package. GNU tar was able to completely compress the
| archive in about 1.2 seconds on my machine. Zopfli, with just 1
| iteration, took 2.5 minutes.
|
| [1] https://github.com/npm/rfcs/pull/595#issuecomment-1200480148
|
| My question of course would be, what about LZ4, or Zstd or
| Brotli? Or is backwards compatibility strictly necessary? I
| understand that GZIP is still a good compressor, so those others
| may not produce meaningful gains. But, as the author suggests,
| even small gains can produce huge results in bandwidth reduction.
| loeg wrote:
| It probably makes more sense to save more bytes and compressor
| time and just switch to zstd (a bigger scoped effort, sure).
| imoverclocked wrote:
| > But the cons were substantial: ... > This wouldn't
| retroactively apply to existing packages.
|
| Why is this substantial? My understanding is that packages
| shouldn't be touched once published. It seems likely for any
| change to not apply retroactively.
| szundi wrote:
| This is a nice guy
| omoikane wrote:
| > Integrating Zopfli into the npm CLI would be difficult.
|
| Is it possible to modify "gzip -9" or zlib to invoke zopfli? This
| way everyone who wants to compress better will get the extra
| compression automatically, in addition to npm.
|
| There will be an increase in compression time, but since "gzip
| -9" is not the default, people preferring compression speed might
| not be affected.
| bombcar wrote:
| You'd have more problems here, but you could do it - if you let
| it take ages and ages to percolate though all environments.
|
| It's been almost 30 years since bzip2 was released and even now
| not everything can handle tar.bz2
| arccy wrote:
| probably because bzip2 isn't a very good format
| tehjoker wrote:
| 5% improvement is basically the minimum I usually consider
| worthwhile to pursue, but it's still small. Once you get to 10%
| or 20%, things become much more attractive. I can see how people
| can go either way on a 5% increase if there are any negative
| consequences (such as increased build time).
| Alifatisk wrote:
| I wonder if there is a way to install the npm packages without
| the crap they come included with (like docs, tests, readme etc).
| rafaelmn wrote:
| I wonder if you could get better results if you built a
| dictionary over entire npm. I suspect most common words could
| easily be reduced to 16k word index. Would be much faster,
| dictionary would probably fit in cache, can even optimize it in
| memory for cache prefetch.
| zahlman wrote:
| This seems like a non-starter to me - new packages are added to
| npm all the time, and will alter the word frequency
| distribution. If you aren't prepared to re-build constantly and
| accept that the dictionary isn't optimal, then it's hard to
| imagine it being significantly better than what you build with
| a more naive approach. Basically - why try to fine-tune to a
| moving target?
| arccy wrote:
| would it really change that quickly? you might get
| significant savings from just having keywords, common
| variable names, standard library functions
| hinkley wrote:
| Pulling on this thread, there are a few people who have looked at
| the ways zopfli is inefficient. Including this guy who forked it,
| and tried to contribute a couple improvements back to master:
|
| https://github.com/fhanau/Efficient-Compression-Tool
|
| These days if you're going to iterate on a solution you'd better
| make it multithreaded. We have laptops where sequential code uses
| 8% of the available cpu.
| frabjoused wrote:
| This reminds me of a time I lost an argument with John-David
| Dalton about cleaning up/minifying lodash as an npm dependency,
| because when including the readme and license for every sub-
| library, a lodash import came to ~2.5MB at the time. This also
| took a lot of seeking time for disks because there were so many
| individual files.
|
| The conversation started and ended at the word cache.
| zahlman wrote:
| I'd love to see an effort like like this succeed in the Python
| ecosystem. Right now, PyPI is dependent upon Fastly to serve
| files, on the order of >1 petabyte per day. That's a truly
| massive in-kind donation, compared to the PSF's operating budget
| (only a few million dollars per year - far smaller than Linux or
| Mozilla).
| cozzyd wrote:
| No problem, I'm sure if Fastly stopped doing it JiaTanCo would
| step up
| JoeAltmaier wrote:
| These days technology moves so fast it's hard to keep up. The
| slowest link in the system is the human being.
|
| That's a strong argument that 'if it isn't broke, don't fix it."
|
| LOts of numbers being thrown around, you add up tiny things
| enough times you can get a big number. But is npm package
| download the thing that's tanking the internet? No? Then this is
| a second- or thirt-order optimization.
| luzifer42 wrote:
| I created once a maven plugin to recompress Java artefacts with
| zopfli. I rewrote it in Java and runs entirely in the JVM. This
| means, the speed is worse and may contain bugs:
|
| https://luccappellaro.github.io/2015/03/01/ZopfliMaven.html
___________________________________________________________________
(page generated 2025-01-27 23:00 UTC)