[HN Gopher] NPMprune: Remove unnecessary files from node_modules...
___________________________________________________________________
NPMprune: Remove unnecessary files from node_modules to optimize
storage
Author : arthurwhite
Score : 36 points
Date : 2023-11-29 17:14 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| lxe wrote:
| We did exactly this when packaging and deploying large node
| manifests at one of my former companies.
|
| Be super careful of removing large swaths of files. Out of
| 150,000 node modules in your manifest, I'm willing to bet at
| least one of them is doing something by reading one of these non-
| source files.
| sbarre wrote:
| This was my concern as well.
|
| Looking at the script source, it's just matching globs, so
| there isn't much smarts to this. I'm sure it works most of the
| time, but yeah..
|
| Do JS packages need some kind of .prodignore file similar to
| other .ignore files?
|
| So with a flag passed, after doing an npm install, there's a
| extra cleanup step that removes explicitly marked files that
| aren't needed for running in prod?
|
| (Not a fully formed idea, I'm sure I'm not thinking of
| drawbacks with this)
|
| Edit: this sort of exists as the .npmignore file?
|
| https://docs.npmjs.com/cli/v10/using-npm/developers#keeping-...
| rezonant wrote:
| .npmignore is the opposite of `files`, it omits files when
| creating the package itself, that's different than trimming
| the files when the package is installed. That said,
| files/npmignore is the correct way to deal with this and you
| should never remove files from the packages you install
| without extremely good reasons, and when you do it, it should
| be very narrowly scoped and handled automatically as part of
| npm install. It should be totally valid to delete
| node_modules and reinstall everything without causing
| problems. This is also the biggest reason to never commit
| node_modules, aside from the pure insanity of commiting
| hundreds of thousands of vendor managed files and inviting
| merge conflicts when two branches change those files...
| QuadmasterXLII wrote:
| Test and bifurcate I guess.
| lxe wrote:
| In production, preferably. This way you'll immediately find
| any issues and will have top priority allocated to fixing
| them.
| armchairhacker wrote:
| I use https://pnpm.io whenever possible. It has many benefits,
| the main one is that the node modules are symlinked to one big
| repo in your home directory, so there isn't nearly as much
| duplication.
| rpastuszak wrote:
| Yup, same here. I've saved 40gb of data by recursively removing
| all node_modules directories from my Mac and replacing npm with
| pnpm.
|
| I did notice small issues with some libraries (react testing
| library IIRC)
| paulddraper wrote:
| *hard linked
| akoboldfrying wrote:
| Curious whether this works on Windows? Symlinks are strange
| there (though hard links work fine on NTFS).
| otteromkram wrote:
| If you don't want to risk adding something nefarious to your
| Google history, DuckDuckGo is a nice alternative.
|
| As my ol' grandpappy used to say, "Why wonder? Let's go
| search the Internets!"
| thekevinscott wrote:
| Another vote for pnpm.
|
| Was introduced at work and it's a game changer. The monorepo
| support (via "workspace:*") is absolutely clutch too.
| swatcoder wrote:
| > In deployment scripts:
|
| >
|
| > wget -qO-
| https://raw.githubusercontent.com/xthezealot/npmprune/master... |
| sh -- -p
|
| Serious question: Is this the norm now? Are people actually
| executing unversioned wget'd shell scripts from random github
| users as part of their deployment workflow?
| petesergeant wrote:
| > now
|
| For about the last 15 years
| mobilio wrote:
| also
|
| curl ... | sudo bash
| dunham wrote:
| And before the web existed, people would distribute software
| packaged inside executable shell scripts
| (https://en.wikipedia.org/wiki/Shar).
|
| It looks like that practice goes back at least 40 years.
| nailer wrote:
| The threat model is exactly the same as executing untrusted,
| uninspected content you've downloaded locally.
|
| I could do some tricks where I sent different files based on
| user agent, but still... most people aren't inspecting the
| download anyway before running it.
| im3w1l wrote:
| > I could do some tricks where I sent different files based
| on user agent
|
| Not from githubusercontent you couldn't. Which I'd say is
| where the majority of these scripts are hosted.
| c0n5pir4cy wrote:
| Just for package authors (or people looking for some easy pull
| requests) out there that might not know this exists.
|
| NPMs package.json has a `files` field which allows you to define
| which files are included on an npm install:
| https://docs.npmjs.com/cli/v6/configuring-npm/package-json#f....
|
| This also extends to an .npmignore file that works similar to a
| .gitignore file.
| creatonez wrote:
| Just beware that some files may seem unnecessary but are
| expected from an idiomatic npm package. Three things that come
| to mind -- a markdown file named README.md, any generated
| typescript definitions, and typescript/babel sourcemaps. And
| something I've seen far too often: please don't give a
| minified, rolled up bundle as the only option, otherwise you
| are chucking your library's users back into the dark ages of
| Bower.js.
| nfriedly wrote:
| The readme gets included automatically even if you don't
| specify it in the files field, ditto for the changelog
| license, and package.json.
|
| Compare https://github.com/express-rate-limit/express-rate-
| limit/blo... to https://www.npmjs.com/package/express-rate-
| limit?activeTab=c...
|
| Agree with you about the other points.
| rezonant wrote:
| Strongly agree on minification. You should not minify or
| bundle anything in your NPM package. That decision should
| only be made by the top level project if it wishes.
| josephg wrote:
| Yep. And if you're writing typescript, please include type
| definitions, source maps, type definition source maps and the
| original typescript source.
|
| Having all of this stuff makes it possible to ctrl+click on
| functions in my libraries and read the corresponding source
| code. That's a godsend during development - well worth a few
| extra kb of files in the npm module.
|
| tsconfig.json: "declaration": true,
| "declarationMap": true, "sourceMap": true,
| ...
|
| package.json (assuming typescript compiles src/ to dist/):
| "files": [ "dist/*", "src/*" ],
| shepherdjerred wrote:
| It makes me wonder why these are even configurable. These
| should all be emitted by default.
| arthurwhite wrote:
| If everyone used the `file` field, the world would be a better
| place, for sure.
| an_ko wrote:
| This is wildly unsafe.
|
| - Some packages contain non-JS files for good reasons, and they
| may break in subtle unpredictable ways when you mess with the
| contents of their package.
|
| - Node.js will happily run JavaScript files even if they're not
| "*.js": A file like "hello.alsdfhlshdfl" works just fine as long
| as its content parses. There is no guarantee that your
| dependencies (and their recursive dependencies) don't statically
| or dynamically load files with completely arbitrary filenames.
|
| - If you distribute packages with license files stripped this
| way, you are violating licenses that require the license to be
| distributed along with the code.
|
| If this is actually a major issue for you, consider instead
| sending PRs to upstream to tidy up their package. This will also
| benefit other users.
| arthurwhite wrote:
| - Of course, this entails the risk of occasional breakage. But
| for 99% of modules, this has no impact at runtime.
|
| - The patterns used to find files are specific enough to target
| only those files that are well known to be useless at runtime.
|
| - The license texts of these libraries can be copied and merged
| into a main LICENSE file.
|
| - Have you seen the number of modules installed by most major
| libraries? Making a pull request for each of them is humanly
| impossible and counter-productive. It's easier to use a simple
| script that releases dozens of MB in a few seconds.
| swatcoder wrote:
| > But for 99% of modules, this has no impact at runtime.
|
| Traditionally, this wasn't an acceptable way to think about
| projects we engineers were being paid lots of money to build.
|
| As you note, a project may hoover in some absurd number of
| dependent libraries and you have no tooling that tells you
| which of those might fall in the 1% and what code paths in
| those 1% intersect with call stacks in your project. You have
| no idea what impact blindly deleting some "They're probably
| unnecessary" files in somebody else's code will have on your
| application and no insight into how to make sure your testing
| unearths problems. It's an invitation to phantom bugs of
| unknown scope and the most frustrating kind of debugging
| effort that comes from chasing those kinds of phantoms.
|
| It's already bad enough that people don't read and review
| their dependent code with the eye they bring to PR's from
| their on-team colleagues, but to then go futzing around and
| deleting things in the unread dependencies because you have a
| hunch that it's no big deal is about as far from software
| _engineering_ as you can get.
| akdor1154 wrote:
| > but for 99% of modules, this has no impact at runtime
|
| So for the typical enterprise crapware where the app template
| installs about 2,000 packages for a React Hello World, how
| many broken modules is that?
| filterfiber wrote:
| > Of course, this entails the risk of occasional breakage.
| But for 99% of modules, this has no impact at runtime.
|
| Right so most projects end up with 100's (random one I have
| is 700+) modules. Which would mean multiple breakages.
|
| The worst part isn't the breakage - it's not knowing where or
| when it breaks, and because it could be missed when it's
| being bundled it can happen in production.
|
| The bundling step should effectively be doing the file
| pruning for you (or even parts of files) and you can be a lot
| more confident that won't miss things.
|
| node_modules are generally big (580MB in my case), but I
| don't know why you'd trade 580MB of storage for reliability.
| For us the 580MB will get bundled under 1MB for our web
| application, essentially all dev machines will be 512GB+ at
| this point anyway.
| pavlov wrote:
| Why not use yarn? It has a much more reliable solution:
|
| https://yarnpkg.com/features/pnp
| arthurwhite wrote:
| Because its primary focus is on redefining how dependencies are
| stored and accessed, rather than modifying the contents of
| these dependencies.
|
| Useless files will still be there.
|
| Also, when you create a Docker image, you avoid packing in dev
| tools that aren't absolutely essential (such as Yarn).
| butshouldyou wrote:
| FYI: The default node Docker images already include yarn.
| Alifatisk wrote:
| Just use pnpm
| arthurwhite wrote:
| While pnpm optimizes storage and reduces duplication, it does
| not inherently remove non-essential files (like documentation,
| Markdown, or test files) within the dependencies.
|
| Also, when you create a Docker image, you avoid packing in dev
| tools that aren't absolutely essential (such as pnpm).
| nusmella wrote:
| Even the alpine nodejs images have pnpm and yarn nowadays
| dlrush wrote:
| Just use PNPM
| Ayesh wrote:
| Yeah no.
|
| Npm already does it at the package registry with ignore/npmignore
| files, and that's the package authors choice. How much storage
| can you really save? 50MB? 200MB? is it really worth the risk of
| running rm on some glob pattern and cross your fingers the
| packages don't require any of the deleted files?
| arthurwhite wrote:
| Not everyone uses the .npmignore file. Maybe it's the author's
| choice, but in the meantime, that's my personal storage space
| that's being used unnecessarily.
|
| I tested it recently on a clean install of Strapi: about 250 MB
| are freed up. Storage is cheap but that still represents a lot,
| especially inside a Docker image.
|
| The patterns used to find files are specific enough to target
| only those files that are well known to be useless at runtime.
| rezonant wrote:
| Others have pointed out that you have no idea which files are
| useless at runtime when inspecting their filename. Executable
| JS does not need the .js extension to be loaded by Node.js or
| any other runtime environment, and on server runtimes files
| can be read at runtime, so JSON files, markdown files,
| webassembly modules, or any other kind of non-JS content can
| have a runtime impact.
|
| You are taking a big risk of subtle breakage right now, and a
| big risk of breakage as you change your project code in the
| future, as you may start to invoke a code path that needs
| that resource in the future.
| dmitrygr wrote:
| > Remove unnecessary [...] node_modules
|
| Try this bash one-liner :) find / -name
| node_modules -print0 | xargs -0 rm -rf
| leipert wrote:
| Yarn@1 has this autoclean feature:
| https://classic.yarnpkg.com/en/docs/cli/autoclean
|
| I used to use it, but at some point the hassle of the
| occasionally breaking package wasn't worth it.
| mirekrusin wrote:
| Just bundle your b/e production entrypoint as a single js file
| similar to f/e.
| joshmanders wrote:
| Can someone explain to me why this is even necessary? I have at
| the time of this comment, 32 node projects on my machine, all
| with their own node_modules, and I'm using less than 200GB
| (total, including everything else on my machine) of my total 1TB
| hard drive space...
|
| Are people that concerned about the size of a directory on their
| machines?
| arthurwhite wrote:
| Maybe not on their machine, but for Docker images that will be
| pulled a thousand times, yes.
___________________________________________________________________
(page generated 2023-11-29 23:01 UTC)