[HN Gopher] Dependencies Belong in Version Control
___________________________________________________________________
Dependencies Belong in Version Control
Author : crummy
Score : 49 points
Date : 2023-11-26 21:08 UTC (1 hours ago)
(HTM) web link (www.forrestthewoods.com)
(TXT) w3m dump (www.forrestthewoods.com)
| o11c wrote:
| > Have you ever had a build fail because of a network error on
| some third-party server? Commit your dependencies and that will
| never happen.
|
| And here is the true root of the problem - people use build
| systems as if they were package managers.
|
| Use a real package manager (and install your dependencies
| _before_ the build) and suddenly it is clear why dependencies do
| not, in fact, belong in version control.
| api wrote:
| If you mean an OS package manager, now your dependencies are
| all really old and you have only one version to choose from.
| There's a reason Rust and Go built their own dependency
| fetching systems and other languages like Java has had them for
| years.
| reactordev wrote:
| And you should use them instead of checking in Jar files into
| your repo...
| shortrounddev2 wrote:
| vcpkg is really awesome, but the versioning system has
| something to be desired
| gravypod wrote:
| I don't think a package manager solves this case. If you own a
| repo with software in it you'll make sure to back up everything
| in that repo. If part of the build of your code in your repo
| fetches something from the network that will eventually fail
| and you will no longer be able to build your software again.
|
| I recently had someone ping me for a build of software that I
| wrote >10years ago. I couldn't build the code in the repo since
| I didn't contain the deps in my repo and the servers my build
| scripts reached out to where down.
| amluto wrote:
| It's worse than just that. Say I pick Fedora as my base. Now
| my project is a pile of SRPMs, that need a nasty bootstrap to
| build (okay, this is hard to avoid), and I'm stuck _using the
| Fedora build system_. Which sucks, sorry. (Okay, at least
| there is only one Fedora build system.) Also a whole OS is
| dragged along. If I want to edit a dependency, eww, I have to
| commit patch files and muck with .spec files. Good times.
|
| Good tooling like the author describes could exist. No distro
| I've ever used is it. Gentoo is probably closer than Fedora,
| as Gentoo is actually intended to _build_ packages as opposed
| to just running them.
| jiggawatts wrote:
| This wouldn't be a problem if the default in VCS systems was to
| use something like S3 blob storage by default for large binary
| files. Just store the torrent-like Merkle tree hash in the VCS
| database.
| eternityforest wrote:
| I wish packages managers had P2P fetching from machines on the
| LAN as a fully supported built-in feature.
|
| Unfortunately, if they did so, they'd probably use IPFS, and I
| don't know when they're going to fix their idle bandwidth
| issues, a lot of people seem to just give up and use web
| gateways, completely defeating the purpose of working without
| single points of failure.
| dharmab wrote:
| https://github.com/microsoft/p4vfs
| gravypod wrote:
| I think something like this, if someone could build it, would be
| amazing. Essentially, this would bring the benefits of a BigCo
| monorepo (and all the SWE-time performance benefits it has) to
| the rest of the world. Lots of nice things could come from tech
| like this being adopted.
|
| I don't think it would get mass adoption though.
| Git+GitHub+$PackageManager is "good enough" and this approach
| wouldn't be significantly better for every use case.
| cxr wrote:
| > Git+GitHub+$PackageManager is "good enough"
|
| Weird way to frame it. Surely if that's good enough, then Git +
| repo host alone, sans package manager, satisfies the same
| criterion. It didn't/doesn't become more complicated by
| omitting package managers like npm, cargo, etc., and their
| associated methodology from the equation. It's the other way
| around. Adding boondoggle like that into the fray is strictly
| _more_ complicated. It 's extra.
| api wrote:
| "Vendoring" dependencies was a good idea that fell out of favor
| because it wasn't quite implemented right. It should be
| revisited.
| rch wrote:
| I'm annoyed that ever few months I'll start something new that I
| know will eventually run on Kudu and Impala (per my employer),
| but the local build requirements are such that it's more
| effective to start with Postgres and figure out porting later on.
| As a NixOS user, I know the answer, but I just haven't allocated
| the time. Maybe this holiday season... Advent of Nix or
| something.
| echelon wrote:
| This top-down, prescriptive suggestion is wrong. The truth is
| that it depends on project construction, language, build
| toolchain, operating system support, and libraries.
|
| Python projects are a particular hell as the multiple attempts to
| solve dependencies didn't capture transitive dependencies well.
| Python also builds against native dynamically linked libraries,
| which introduces additional hell. But that's _Python_.
|
| The author is trying to use ML projects on Windows, and he
| probably hasn't realized that academics authoring this code
| aren't familiar with best practices, aren't writing multi-
| platform support, and aren't properly and hermetically packaging
| their code.
|
| To compare Python to a few other languages:
|
| - Rust projects are almost always incredibly hermetic. You really
| only need to vendor something in if you want to make changes to
| it.
|
| - Modern Javascript is much better than the wild west of ten
| years ago. Modern tools capture repeatable builds.
|
| Don't go checking in built binaries unless you understand that
| your particular set of software needs it. Know the right tools
| for the job.
| jmyeet wrote:
| No. No, they don't. Specifically, _binary_ dependencies don 't
| belong in a repo and you want to use binary dependencies rather
| than source dependencies where possible.
|
| Once again, we see 20 years of dependency systems that have
| failed to do what Maven established as the bare minimum.
| Specifically:
|
| 1. Create signed versions of dependencies so you can't rebuild a
| given release;
|
| 2. Allow you to specify a specific version, the latest version or
| even the latest minor version of a specific major version;
|
| 3. Allow you to run internal repos in corporate environments
| where you might want to publish private libraries; and
|
| 4. Version information is nowhere near any source code. Putting
| github URLs in Go source files is the most egregious example of
| bad dependency management from a language in recent history.
|
| Every line of source code, whether its yours or third-party comes
| at a cost. Depending on your toolchain, this may well increase
| compilation time and required resources.
|
| You want _reproducible_ builds. If you can do that without
| putting every dependency in a repo then you should. If you can 't
| then you have a bad dependency system.
| shortrounddev2 wrote:
| I think the only C/C++ package manager that is even close to
| maven is vcpkg
| jmyeet wrote:
| Header file is both a huge strength and weakness of C/C++ and
| are really an anachronism now. It's very general purpose text
| substitution, which is very powerful, but means you have to
| include .h files to get types, signatures and so on. This
| kills binary compatibility (specificially, if you're linking
| to static or shared libraries, you still need a .h file).
|
| Even C++ templates aren't really much better and they're
| still text replacement.
|
| More modern languages have taken the use cases for .h files
| and incorporated them without the general purpose text
| substitution. Like Rust's macros a superior to #define
| macros. Type aliases (in various langauges) are better than
| #define uint32 unsigned int.
| __MatrixMan__ wrote:
| I definitely agree about URL's in go.
|
| Maven comes up all the time as an example of packaging done
| correctly. It just does JVM stuff though, right? Seems like
| it's winning at an easier game.
|
| Reproducible builds are a hard thing to achieve in the general
| case. Even something as simple as packaging your files in a tar
| will blow your determinism.
|
| I think retooling for determinism is worth spending time on,
| but I'm not sure I could convince my boss of that.
|
| So far as I'm aware, all dependency systems are bad. Nix is the
| least bad one I know.
| jmyeet wrote:
| So Java is for JVM languages and binaries. That includes
| things people probably don't care about anymore (eg Groovy,
| Clojure, Kotlin, Scala) because they all compile to JVM
| bytecode so as far as Maven is concerned, they're indistiYou
| cannguishable.
|
| You can include any static assets you want (eg css, js, html
| files). I honestly haven't looked into that. Putting static
| assets in a project is relatively straightforward but a JS
| depdendency? Yeah, that's probably a no go. Or it's really
| awkward.
|
| For hermetic builds, given a blank slate, I'd probably start
| with Bazel.
| justinwp wrote:
| > This is exactly how Google and Meta operate today
|
| I wish it was that great. Works until you need to have a
| different version than some other piece of code, import a
| dependency that requires 100 others, or you need to build for
| some other platform than the "blessed" target. I choose google3
| only when I have to or when I am certain of the scope. (I am in
| DevRel, so I have more options than typical engineers for my
| work)
| reactordev wrote:
| >"My background is C++ gamedev"
|
| This is why you think this way. Your proposal is not new. The
| issue is not in the fact that you don't have your dependencies,
| it's the fact that you are coming from a world that
| doesn't/hasn't had support for it. Every other language has had
| package managers for this very reason.
|
| Where do you draw the line? OS libs and package manager is ok but
| it's not for a developer?
|
| Go learn vcpkg and come back to us when you learn why everyone
| does it this way.
| codetrotter wrote:
| > The issue is not in the fact that you don't have your
| dependencies, it's the fact that you are coming from a world
| that doesn't/hasn't had support for it.
|
| Exactly. With Rust you can commit your Cargo.lock file and you
| will then be able to rebuild your project with the exact same
| version of all your deps in the future. No need to commit the
| deps themselves.
| kibwen wrote:
| And cargo-vendor is available out of the box as well
| (although it doesn't go quite so far as to add the entire
| Rust compiler toolchain to your repo): https://doc.rust-
| lang.org/cargo/commands/cargo-vendor.html
| TillE wrote:
| Yeah vcpkg.json is the appropriate solution. That plus one git
| commit hash pins and verifies everything. Make port overlays if
| you need em.
|
| And you really shouldn't get married to one particular version
| of a compiler toolchain unless you know for sure that something
| is going to break if you update. That just leads to a lot of
| annoyed programmers stuck using ancient tools for no reason.
| KRAKRISMOTT wrote:
| > _Go learn vcpkg and come back to us when you learn why
| everyone does it this way._
|
| IMO it's mostly because of a culture of laziness among system
| programmers. They like to kick the ball down the road and make
| dependency management the distro's problem. Too many embedded
| engineers not being paid enough. When you are building a
| product you shouldn't be wasting your time on this sort of
| "make work" that reinvents the wheel multiple times. Come to
| the machine learning world and everything including the kitchen
| sink and graphic drivers are bundled in the build because
| _otherwise your deployment won 't work and you can't afford to
| pay engineers that cost half a mil a year to spend their time
| fiddling with deployment binaries_.
| moomoo3000 wrote:
| Do you have any links you can recommend for how this is done
| in machine learning projects?
| KRAKRISMOTT wrote:
| The short version is to adopt modern tooling (the vcpkg
| suggestion is an excellent one) and dependency management
| rather than using OS specific tools (unless you are on
| Windows). Part of the reason for this mess is because the
| Unix world operates on an evergreen philosophy and nothing
| is truly backwards compatible out of the box without manual
| intervention. The modern web development and machine
| learning world runs on the opposite doctrine that
| programmer time is the most expensive commodity above all
| else; bandwidth is cheap, storage has a negligible cost,
| and horizontal scaling can sometimes fix compute bound
| problems. Deployment processes are thus optimized for
| reliably reproducible builds. Docker is the classic
| example: bundle literally every dependency possible just to
| ensure that the build always succeeds, anywhere, anytime.
| It has its downsides but it is still one of the most widely
| used deployment methods for a reason.
|
| In the Windows world, you often find desktops with ten
| different copies of the "Windows C++/.Net redistributable"
| (the windows version of the C++/CLR standard library
| dynamically loaded artefacts) installed because each
| individual app have their own specific dependencies and
| it's better for them to bundle/specify it rather than rely
| on the OS to figure out what to load. The JavaScript,
| Julia, Rust, Go ecosystems all have first party support for
| pulling in driver binaries that may be hundred of gigabytes
| in size (because Nvidia is about as cooperative as a 3 year
| old child). You don't waste time fiddling with autotools
| and ./configure and praying that everything would run. Just
| run `npm install` and most if not all of the popular
| dependency heavy libraries would work out of the box.
| hoten wrote:
| vcpkg may expire assets after 1.5 years, so to achieve long-
| term reproducibility you will need to cache your
| dependencies.... Somewhere. Not sure what the expected solution
| is.
|
| https://github.com/microsoft/vcpkg/pull/30546#issuecomment-1...
| KRAKRISMOTT wrote:
| Microsoft needs to fund Vcpkg more, the developer experience
| (especially installation) still has room for improvement.
| groestl wrote:
| > Every other language has had package managers for this very
| reason
|
| Nah. Package managers are nice, but they only solve Problem 1
| (Usability). If you have any business continuity concerns,
| you'll at least cache and archive the dependencies, and your
| package management will effectively become a binary blob
| extension of your VCS.
| pmorici wrote:
| Isn't this the same kind of attitude around containers. "I don't
| want to think about or document the dependencies" lets just throw
| it all in a container full of crap that no one fully understands
| and it will "just work" because it worked for someone once
| before.
|
| One of the things that I find very useful is to start from a base
| install of a particular OS and then be very meticulous about
| documenting each package I need to install to get software to
| build. You can even put this into the documentation and automate
| checking the dependencies are there with the system package
| manager. The dependencies and how you check them will be
| different across different distros and versions but at least you
| had an understanding at one point to work from if you need to
| figure it out going forward.
| malkia wrote:
| This came out this year from Perforce -
| https://www.perforce.com/blog/vcs/what-is-virtual-file-sync-...
| malkia wrote:
| There is also this - https://github.com/microsoft/p4vfs and
| several other solutions - just need to dig around.
| forrestthewoods wrote:
| Microsoft also has VFSforGit. Sadly they abandoned it to
| pursue sparse clones. I'm not sure the full story why. :(
| eternityforest wrote:
| Seems like what all we would need on the VCS side is something
| like Git LFS but with global chunk based deduplication, binary
| file patching algorithms, which serves the actual files over
| bittorent.
|
| The last part is essential, because GitHub LFS is too expensive
| for anyone to just try out on their own.
|
| But then, on the dev tool side, we would need automated ways to
| get all that stuff in the repo in the first place, and make sure
| that the IDE linting and autocomplete knows about it.
|
| I used to put my python dependencies in a thirdparty folder, and
| have a line that alters the sys.path to point at it.
|
| I just spent a week getting rid of all that and using PyPi,
| because it didn't play nice with linters and I couldn't swap
| different versions of dependencies with virtualenv, updating
| required manual work, and there was no management to make sure
| there wasn't version conflicts in subtle ways.
|
| I like the idea of deps in VCS, but not as much as I like using
| tools as the devs intended and sticking to the official workflow
| everyone else uses.
| reactordev wrote:
| This is horrible. Just use vcpkg or any number of other c++
| package managers. Pypi exists for a reason. Maven and gradle
| exist for a reason. Nuget exists for a reason. NPM/Yarn as
| well.
|
| Storing your dependencies with your code ensures you _will_ be
| out of date, vulnerable to whatever vulnerabilities have been
| patched since then, and that your build will produce a
| different hash so windows defender will do a full binary scan
| on you every time. Not to mention an all-hands on deck weekend
| holiday to upgrade.
| adrianmonk wrote:
| > _Source code, binary assets, third-party libraries, and even
| compiler toolchains. Everything._
|
| How far down the stack of turtles do you go, though?
|
| Should you include the libc that the compiler requires to run?
| Other parts of the operating system? The kernel? An emulator for
| hardware that can run that kernel?
|
| Eventually, all of those things will stop being produced or easy
| to find. Even if you have the libraries and compiler in version
| control, can you build a game that ran on MS-DOS 5.x or CP/M or
| DEC VMS?
|
| My point is that you may want to just designate a stable
| interface somewhere (a language standard, some libc ABI, etc.) as
| the part you expect to not change. Be aware of what it is, and
| account for it in your plans. If a language is the interface that
| you expect to be stable, then don't upgrade to a compiler version
| that breaks compatibility. Or do upgrade, but do it in an orderly
| manner in conjunction with porting your code over.
|
| If you want your code to be runnable by our descendants 1000
| years from now, you should probably have an incredibly simple VM
| that can be described in a picture and a binary that will run on
| that VM. (In other words, you go down so many turtles that you
| reach "anyone who knows how to code or build machines can
| implement this VM".)
| lebean wrote:
| Everything! Check in your OS's system files! Build a new PC
| with parts identical to the one you're building on! Build many
| of them, one for each commit, and put them all in a warehouse!
| adrianmonk wrote:
| I mean, if you're the military, this might be the right
| answer. If you're making a personal web site, probably not.
| hducy4 wrote:
| This is a strawman. You're extending their argument to an
| extreme so it sounds silly when it's not what has been
| proposed.
|
| The argument is clearly to keep direct dependencies required
| for building in source control so that if you have a working
| build system, you can build the software indefinitely and
| independently from the internet.
|
| Build systems and operating systems don't disappear overnight.
| Leftpad does.
| ClumsyPilot wrote:
| The VMs have a reasonably standard interface, so do container
| images, so kinda either could work as your 'everything'.
|
| Alternatively just make a system image of the entire PC, setup
| exactly how you want it, with some common hardware that you can
| expect to be available in 10 years -like a standard intel cpu.
| matisseverduyn wrote:
| "Security" would be a useful benefit/section to add to this post:
|
| A.) If maintainers of your dependencies edited an
| existing/previous version, or
|
| B.) If your dependencies did not pin their dependencies.
|
| For instance, if you installed vue-cli in May of last year from
| NPM with --prefer-offline (using the cache / basically the same
| as checking in your node_modules), you were fine. But because
| vue-cli doesn't pin its dependencies ("node-ipc"), installing
| fresh/online would create WITH-LOVE-FROM-AMERICA.txt on your
| desktop [1], which was at the very least a scare, but for some,
| incredibly problematic.
|
| [1] https://github.com/vuejs/vue-cli/issues/7054
| brendoncarroll wrote:
| I don't think you need to go quite so far as checking gigabytes
| of executables into version control. If you download some
| dependencies at build time, that's fine as long as you know
| exactly what they are ahead of time. "Exactly what they are"
| means a hash, not a name and version tag.
|
| The dockerized build approach is actually a good strategy,
| unfortunately it's done by image name instead of image hash in
| practice.
|
| Upgrading dependencies, or otherwise resolving a name and version
| to a hash is a pre-source task, not a from-source task. Maybe it
| can be automated, and a bot can generate pull requests to bump
| the versions, but that happens as a proposed change to the
| source, not in a from-source task like build, test, publish, or
| deploy.
| malkia wrote:
| .exe may belong in p4, .pdb's not! :)
| jmisavage wrote:
| I agree in principle, but you also need the same machine too
| otherwise new OS and hardware might introduce issues too. I
| resurrected an old project that needed to convert LESS to CSS and
| the node version required couldn't run on my machine. Upgrading
| it to a version that could introduced filesystem changes that
| broke the packages that it was looking for.
|
| Now just imagine businesses in the middle of a platform shift
| like Macs going from Intel to Apple's own ARM chips. Eventually
| you'll going to be missing something and all this work of
| bundling everything will end up being busy work.
| simonw wrote:
| If you're concerned about bloating your Git repository with non-
| unique binary files (as you should be) a trick I've used in the
| past that worked really well was to have a separate project-
| dependencies Git repository that all of my dependencies lived in.
|
| I was working with Python, so this was effectively a Git-backed
| cache of the various .tar.gz and .whl files that my project
| depended on.
|
| This worked, and it gave me reassurance that I'd still be able to
| build my project if I couldn't download packages from PyPI... but
| to be honest these days I don't bother, because PyPI has proven
| itself robust enough that it doesn't feel worth the extra effort.
|
| I keep meaning to set myself up a simple S3 bucket somewhere with
| my PyPI dependencies mirrored there, just in case I ever need to
| deploy a fix in a hurry while PyPI is having an outage.
| spiffytech wrote:
| Is it going to be a problem for your Windows/Linux developers
| that someone committed a node_modules with macOS-only binaries
| inside?
| matrss wrote:
| Nix (https://nix.dev/) can provide all of this, although in a
| smarter way than just through dumping everything in the VCS. Some
| projects use it already to provide a reproducible development
| environment and if done right a clean build is just a `nix-build`
| away.
| delotrag wrote:
| Honestly surprised the article didn't mention Nix or Guix.
| Seems like functional package management solves the exact
| problems the author is worried about.
| jbverschoor wrote:
| I think you forgot to include: toolchains
| \win arm \linux x86 \linux arm
| \macos x86 \macos arm
|
| What I _do_ think is that dependencies should be versioned, and
| their artifacts should be immutable.
|
| Dependency management is not the only thing wrong with gamedev
| dboreham wrote:
| This is both right and wrong:
|
| If you're shipping shrink-wrap product, or equivalent, then you
| should freeze everything in carbonite so you can later recreate
| the thing you released. The article is written as if this is a
| novel idea but it isn't. Decades ago when I worked on products
| shipped on CD it was standard procedure to archive the entire
| build machine to tape and put that in a fire safe. In fact I
| subsequently (decades later) worked for law firms on patent cases
| where they were able to get those backup tapes to prove something
| about the content of software shipped on a specific date.
|
| otoh for the typical present-day software project you don't want
| to re-create an identical build result as someone else got six
| months ago. For example if it's a JavaScript project then Node is
| going to be two versions out of date and probably full of
| security bugs from that time. So you actually want "code that
| behaves as expected but built with current dependencies and
| toolchain". Admittedly experience shows that for some languages
| this is an unreasonable expectation. Some level of ongoing
| maintenance to the codebase is often required just to keep it
| building with security-fixed dependencies.
| jbverschoor wrote:
| Don't forget https://fossil-scm.org/
| mr_tristan wrote:
| The challenge when I see posts like this is the people in charge
| of building this "check it all in" ecosystem usually forget about
| the developer experience and basically just implement a CI
| system. Cool, you can 're-run' an old build cleanly, which is
| good, but not enough.
|
| How about commercial IDEs? Cloud environments? A lot of developer
| environments these days include a ton of stuff that likely
| doesn't make sense to check in, usually licensing config is
| annoying, or because you're relying on runtime services. And all
| this time engineers spend on their own machines is basically time
| wasted, which isn't really a great solution to pitch to a
| business.
|
| Side note: I used to work for Perforce until the private equity
| sale. If there was a platform to vendor everything like this, it
| would be Perforce, because you could already do this kind of
| thing for years. AFAIK not many Perforce customers ever did this,
| and I don't think it was because Perforce wasn't capable. It's
| just a subtly wicked problem. Getting this right - just check out
| and go across different software development stacks - requires a
| lot of investment. It does look like Perforce has been acquiring
| many other aspects related to the application lifecycle, so in
| theory, they should be better positioned to be the "vendor
| everything on our stack" solution, but I'm not convinced this is
| going to work out well.
|
| Cloud development environment vendors seem to be the best
| positioned as a product for solving this problem, because there
| is less of that "go figure out your DX" aspect left to the
| customer. But the right CDE would have to have a lot of
| enterprise-style controls. This is so new that I'm not sure who
| will get it right first, but my guess is that we'll get to a more
| "development to delivery" integrated environment, and away from a
| hodgepodge of tools managed per project.
| Joel_Mckay wrote:
| True in some situations, but a fundamentally flawed approach to
| FOSS.
|
| Indeed, if you are statically linking noncritical code, than for
| maintainability it is easier to version-lock an entire OS with
| the build tree in a VM. Thus, the same input spits out the same
| audited binary objects every time. In some situations it is
| unavoidable (see Microsoft or FPGA ecosystem).
|
| However, a shared object library ecosystem is arguably a key part
| of FOSS when it works properly on *nix. As it is fundamentally
| important to keep versioned interoperability with other projects
| to minimize library RAM footprints etc. Additionally, all
| projects have a shared interest in maintaining each others
| security, rather than trying to waste resources on every
| application that has legacy stripped static obfuscated vulnerable
| leaky objects.
|
| "Reuse [shared objects], Reduce [resource costs], and Recycle [to
| optimize]."
|
| Sounds like familiar advice... =) like how some git solutions
| don't handle large binary artifacts well or simply implode.
|
| Good luck, and maybe one can prove all those cautionary stories
| wrong. YMMV =)
___________________________________________________________________
(page generated 2023-11-26 23:02 UTC)